Probability and 
Its Applications 


Bert Fristedt 
Lawrence Gray 


A Modern Approach 
to Probability Theory 


Springer Basel AG 


Probability and its Applications 


Series Editors 


Thomas Liggett 
Charles Newman 
Loren Pitt 


Bert Fristedt Lawrence Gray 


A Modern Approach 
to Probability Theory 


Springer Science+Business Media, LLC 


Bert Fristedt 

Lawrence Gray 

School of Mathematics 
University of Minnesota 
Minneapolis, MN 55455 
U.S.A. 


Library of Congress Cataloging-in-Publication Data 


Fristedt, Bert, 1937- 
A modern approach to probability theory / Bert Fristedt, 
Lawrence Gray. 
p. cm. -- (Probability and its applications) 
Includes bibliographical references (p. - ) and index. 


1. Probabilities I. Gray, Lawrence F. II. Title. III. Series. 
QA273.F92 1996 96-5687 
519.2--dc20 CIP 


Birkhäuser Boston 


Printed on acid-free paper 


® 
© 1997 Springer Science+Business Media New York Birkhäuser ip 
Ursprünglich erschienen bei Birkhauser Boston 1997. 


Copyright is not claimed for works of U.S. Government employees. 

All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, 
or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording, 
or otherwise, without prior permission of the copyright owner. 


Permission to photocopy for internal or personal use of specific clients is granted by 
Birkhauser Boston for libraries and other users registered with the Copyright Clearance 
Center (CCC), provided that the base fee of $6.00 per copy, plus $0.20 per page is paid directly 
to CCC, 222 Rosewood Drive, Danvers, MA 01923, U.S.A. Special requests should be 
addressed directly to Birkhauser Boston, 675 Massachusetts Avenue, Cambridge, MA 02139, 
U.S.A. 


ISBN 978-1-4899-2839-9 ISBN 978-1-4899-2837-5 (eBook) 
DOI 10.1007/978-1-4899-2837-5 


Typeset by the authors in AM SIATEX. 
987654321 


Contents 


Giet OF TADIES aici iin bas ie ee ee a eee a oh EO Re a eed ee ea ee XV 


PYEIACE: ob 6 6 i see Sls hg he LR ee i ee a xvi 


Part I. Probability Spaces, Random Variables, and Expectations 


Chapter 1. Probability Spaces ................ 0. cece eee eee eee 3 
Ll. Introductory examples aig iss4i by cob E ET EEA ESE daw 3 
1.2. Ingredients of probability spaces ........... 0... eae 6 
136 OCS fine taeed Anke rari bay leabadae einen daring ole oh ay ba truny A 8 
lA- Borelo=Melds ie citas see asda els und euch ewe Maat chet walet ues 9 
Chapter 2. Random Variables ...............0. 000. c cece ees 11 
2:1. Definitions and basic results 425 pee es seco unt Aare ede ees 11 
2.2. R¢-valued random variables ..........0.0. 00 ccc eee eee eee eee ees 14 
2.3. R®-valued random variables ............. 0... cece cece eee ees 18 
2A Par her example pasa ssa tet od a a at oan mete eae ae 20 
Chapter 3. Distribution Functions .......................0. 0000. eee 25 
Side Base GNCOLY ts ces ab nob ale wee ne ahd ee asin Ober Ul EAD ek Bag ed 25 
3.2. Examples of distributions <4) tac an sxe wikis eaten dock eee Geek ences 29 
3.3. Some descriptive terminology ............... 0. 0c eee cee eee eens 31 
3.4. Distributions with densities ............. 00... cee eens 35 
Jor Further examples sererken eres ele Sema eae TAa 37 


3.6. Distribution functions for the extended real line................... 39 


vi CONTENTS 


Chapter 4. Expectations: Theory ................. 0.00. e cece ee ees Al 
Ads CNIILIONS: 6522.24 men Suet ewan hs be 6 OUSUREe b we ee Ee oe 41 
AD) Linearity and POSIT aeara a EEA EE U AERAN 46 
4.3. Monotone convergence ...... 0... ccc eect nee e ene nnes 49 
4.4. Expectation of compositions ........... 00... c ccc cece cee eens ol 
4.5. The Riemann-Stieltjes integral and expectations .................. 53 
Chapter 5. Expectations: Applications ...................... 000 ee eee 59 
5.1. Variance and the Law of Large Numbers ......................2--. 59 
5.2. Mean vectors and covariance matrices .............. 20:0 cece eee eee 66 
5.3. Moments and the Jensen Inequality ....................... 000 eee 68 
5.4. Probability generating functions: ...2.¢s 102 ses aev ie eins Pea duew es 70 
5.5. Characterization of probability generating functions ............... 73 
Chapter 6. Calculating Probabilities and Measures ................. 75 
6:1 Operations on Cvents antte anA Ta a EDS EREI 75 
6.2. The Borel-Cantelli and Kochen-Stone Lemmas .................... TT 
6.3: LMCNISIONZER CISION: eurrera sson E AE E EE RA poem pede 80 
6.4. Finite and o-finite measures .......... 000. c cece eee ees 81 
Chapter 7. Measure Theory: Existence and Uniqueness ........... 85 
7.1. The Sierpiński Class Theorem and uniqueness .................565 85 
7.2. Finitely additive functions defined on fields .................-..--. 87 
7.3. Existence, extension, and completion of measures ................. 90 
TA Tee Cs oie tere as copia te ata e ea on eM hae ah ra 95 
Chapter 8. Integration Theory ................ 6. cece eens 101 
8:1; Lebesgue interrati as vs veretecsaaeitasaiaxeataes E EE ps teste we 101 
8:2: Convergence TheGrems! i<chcidnn ri endiederen nea niet RERE E eis 105 
8.3. Probability measures and infinite measures compared ............ 110 
8.4. Lebesgue integrals and Riemann-Stieltjes integrals ............... 111 
8.5. Absolute continuity and densities ............ 0... c cece eee eee ees 115 


8.6. Integration with respect to counting measure .............-...... 118 


CONTENTS vil 


Part 2. Independence and Sums 


Chapter 9. Stochastic Independence .................... 5. eee eee ee 121 
9.1. Definition and basic properties ............ 0c. cece eee e eens 121 
9.2. Product measure: finitely many factors ............ 0.0.0. cee eee 127 
0:3. “Phe Pubinn Theorem oa xieda e Aeara OA Chatelaine 130 
9.4. Expectations and independence 2 venineyesteavtas beeeehesneds aes 133 
9.5. Densities and independence ................ 0c eee cee ee neces 134 
9.6. Product probability measure: infinitely many factors ............. 136 
9.7. The Borel-Cantelli Lemma and independent sequences ........... 141 
Or8 F Order SUALISHICS Yering toneaees lem e gt a eee son 143 
9.9. | Some new distributions involving independence ................ 145 
Chapter 10. Sums of Independent Random Variables ............. 147 
10.1. Convolutions of distributions .......... 0.0.0... eee eee ees 147 
r02. Miultinomral distributions. ish 02006440 edes Gti eeca hoe maul Ge wh bes 152 
10.3. Probability generating functions and sums in TE anai 153 
104. Ditichlet-distributions -s oo, cauwtaladevsty tet atenia dois aceite 156 
10.5. t Random sums in various settings 2.2. ccc. sas cise eee te 159 
Chapter 1l- Random Walk roosat exists oes eo ee RE ees 163 
Lids Read oni sequences: iere nasa p ands weasels cease Bho Pine eee wee 163 
11.2. Definition and examples cic veonta hich as tee nennen enaner een 164 
£13. Piltrations and stoppin? (MeS vcs seus ne ea eG 28 aes Seas 171 
11.4. Stopping times and random walks .................... 0. eevee ee 174 
LLOSA Mitine hime esan Pleats og tats a wea ou eneincees Pewee. 175 
16s RetUInS tO) accu secaout ee heteentw ieee aed dee Rene eat uoekee 178 
11.7. ¢ Random walks in various settings .................000..000008. 183 
Chapter 12. Theorems of A.S. Convergence ........................ 185 
12.1. Convergence in probability ............. 0. eee eee eee 185 
12:0. Laws Of Laree Numbers: 1025004500 Sekdas eed EE Say peat es 188 
1233 ND PlCALIONS syheraes aie a aoe a rae spa tier ane te 193 
DPA SOS AW Sis cries hoes ae A ENEE SEE TE as tang all aul ge rea i EE 196 


12.5. Random infinite series .........0. 0... e ee eee eneenas 198 


viii CONTENTS 


12.6. The Etemadi Lemma sriricrisccrei esseri tae aE ATAS 
12.7. ł The Kolmogorov Three-Series Theorem ..................0. 
12.8. t The image of a random walk ............ 0... cee 


Chapter 13. Characteristic Functions .......................000 00 
13.1. Definition and basic examples ........... 0.0... cece eee eee 
13.2. The Parseval Relation and uniqueness .................0000ee 
13.3. Characteristic functions of convolutions ...................0405 
13.4. Symmetrizal on anuria o ee NENSETER EDERE 
13.5. Moment generating functions ............. 0. ccc cess eee eee ees 
13:67 Moment theorems: veniiecticd breheraeldiiee einen hs deena oe 
13e Tnversion theorems wexcrexca te Ge ated eden et ocen weenie Mae kas 
13.8. Characteristic functions in RÊ ........ 00... ccc c eee eee 
13.9. Normal distributions on d-dimensional space ................. 
13.10. + An application to random walks on Z..................44. 


13.11. + An application to the calculation of a sum ................ 


Part 3. Convergence in Distribution 


Chapter 14. Convergence in Distribution on the Real Line ..... 
14.1. Definitions and examples ............ 0... eee 
14.2. Limit distributions for extreme values ....................-6. 
14.3. Relationships to other types of convergence .................. 
14.4. Convergence conditions for sequences of distributions ........ 
14.5. Sequences of distributions on EEEE tua bak ASAE eats 
14.6. Relative sequential compactness ....... 2... 0c ccc ec eee ees 
14,7. The Continuity Theorem 222252 0s0cice.disesevaata singe ees 
14.8. Scaling and centering of sequences of distributions ........... 
14.9. Characterization of moment generating functions ............ 
14.10. Characterization of characteristic functions ................. 

Chapter 15. Distributional Limit Theorems for Partial Sums .... 
15.1. Infinite series of independent random variables ............... 
15.2. The Law of Large Numbers revisited ................0-e eee 
15.3. The Classical Central Limit Theorem .................0..005. 


eee 


15.4. 
15.5. 
15.6. 


Chapter 
16.1. 
16.2. 
16.3. 
16.4. 
16.5. 
16.6. 
16.7. 
16.8. 
16.9. 


Chapter 
17.1. 
17.2. 
17.3. 
17.4. 


Chapter 
18.1. 
18.2. 
18.3. 
18.4. 
18.5. 
18.6. 
18.7. 


Chapter 
19.1. 
19.2. 
19.3. 
19.4. 


CONTENTS ix 


The general setting for iid sequences ............ 0. cee eee eee eee 277 
T LAree deviations arrere genie uiGe mes eens lyases mee eek 280 
+ Local limit-theorems:: s scgs%i pias deeacGies oaake eee ees 282 
16. Infinitely Divisible Distributions as Limits ........... 289 
Compound Poisson distributions ...............0 5. eens 289 
Infinitely divisible distributions on R ................ 0c eee eee 293 
Lévy-Khinchin representations ........... 00. c cece eee e eens 295 
Infinitely divisible distributions on Rt ....................0.0 08. 300 
ERtens OR OUR” miksarar Ee a erena anai 302 
The triangular array problem: introduction ..................... 303 
Nid riomeulat arrays Aar Kor Enone SAE E TA E A 306 
Symmetric and nonnegative triangular arrays ................45. 310 
+ General triangular arrays: in cssnccevs onc Seed emer en keys 313 
17. Stable Distributions as Limits ......................... 323 
Reeular Vanavion:7.%ccces ta neues ahaa own fue Saxe gait ee 324 
The Stable distributions 3x4 svn d vata ie eee ee Soo weet ow eds 326 
t DOMGINS (OF attraction meccs nee stan Ee he ieee SKEE 332 
t Domains Of Strict attraction: reris skeare ah Fe end oe ee eG 342 
18. Convergence in Distribution on Polish Spaces ....... 347 
Polk hepa ea mtae er line ee ele iA oot a eerie apa 347 
Definition of and criteria for convergence ............. 0. eee eee. 352 
Relative sequential compactness .......... 0... cece ee eee eee ees 355 
Uniform tightness and the Prohorov Theorem .................. 357 
Convergence in product spaces seori dis osiridin atka na eee ees 308 
The Continuity Theorem for RË .............. 0.02 e eee ences 361 
7 The Prohorov MUNG 420425 coger sad Be wae waedeR EMOTE Gee eet 363 
19. The Invariance Principle and Brownian Motion ..... 367 
Certain sequences of distributions on C[0,1] .................... 368 
The existence of and convergence to Wiener measure ........... 371 
Some measurable functionals on C[0,1] .....................-05- 374 
Brownian notion on |0;00) .220ccnvieess sheen EAA EAN ERA 379 


19.5. 
19.6. 
19.7. 
198. 


Chapter 
20.1. 
20.2. 
20.3. 
20.4. 


Chapter 
21.1. 
21.2. 
21.3. 
21.4. 
215. 
21.6. 


Chapter 
22T: 
Doe 
223. 
22A. 
220; 


Chapter 
23L 
23.2: 
23.3. 
23.4. 


CONTENTS 


Filtrations and stopping times ............... 0.0. anrr enna 382 
Brownian motion, filtrations, and stopping times ............... 385 
t Characterization of Brownian motion ..................000-00 389 
| Law of the Iterated Logarithm ......................00 0. eee ae 390 
Part 4. Conditioning 
20. Spaces of Random Variables .......................45. 395 
Hilbert: Spaces: verc4aeevitaniscaeeaeeiene nay a Mae eee eins 395 
The Hilbert space Lo(Q, F, P) s: vicss0 se sakidet Maw acon aeaetdeevs 397 
The metticspace LOE P Gy ocone stated Maa E EA de erases 401 
tepest linear eSa O since ae ne eee adn waa ee ee ae ee ees 402 
21. Conditional Probabilities ..........................00.. 403 
The construction of conditional probabilities .................... 403 
Cénditionaldistri DutlOnS Makar rassist r Een tected eects 412 
Conditional densities «34 oseSuveuse beds es ayn er eb ava ere NAA 416 
Existence and uniqueness of conditional distributions ........... 417 
Conditional independence 2i.c222n ae keay cee erteaydevet se edes 422 
t Conditional distributions of normal random vectors ........... 427 
22. Construction of Random Sequences .................. 429 
Fhe ASIC TEsU th sein) pact hcg eae ae ae ence Meine Caran anata 429 
Construction of exchangeable sequences .............-.0.+--008. 433 
Construction of Markov sequences ............ 0: cece ee eee eee 436 
Poara esoteerik ave ene E Sh aaa ns Se Raa TAESTE Bae Sak 437 
PeOUupOM COE NE (esecmedineetsnee ne neta tes hats Abeer ey 439 
23. Conditional Expectations .................... 00s eee eee 443 
Definition of conditional expectation ........... 000... e ee eee 443 
Conditional versions of unconditional theorems ................. 448 
Formulas for conditional expectations ............. 00. ee see eee 451 
Conditional VATIANCe. 42.2 stein Aa AAEE eke Ayer eia are te 453 


Chapter 
24.1. 
24.2. 
24.3. 
24.4. 
24.5. 
24.6. 
24.7. 
24.8. 


Chapter 
25.1. 
20.2 
25.3. 
25.4. 
25.5. 
25.6. 
25:7. 


Chapter 
26.1. 
26.2. 
26.3. 
26.4. 
26.5. 
26.6. 
26.7. 

Chapter 
27.1. 
21.2. 
27.3. 
27.4. 


CONTENTS xl 


Part 5. Random Sequences 


24: Martmgales ocirdt sacar ce inie run ce see santas hens 459 
Basie: Cenmivions-vwecstaciees stg dat ete eee a ANAE OA eae es 459 
ED KAMDICS sedan ernier rai peewee Cotes EEEREN EDA 461 
DOOD CECOMPOSIEION: sesis pir nnet tobe eens Geant ea eens 465 
Transformations of submartingales............ 0.0.0... eee eee eee 466 
Another transformation: optional sampling ..................... 467 
Applications of optional sampling ......................0.00000. 473 
Inequalities and convergence results ............. 00.00 cece eee ATT 
+ Optimal strategy in Red and Black........................004. 484 
25. Renewal Sequences .........onnnuunnasnr ennn rreren 489 
Pasic CriteniOn) Greier tiaa a a s eee a Race ees 490 
Renewal measures and potential measures ................-.... 491 
Ean OSE hacen aati Seg wats See RA Sle a a Page Ow A ected ace 494 
Renewal theory: a first step (os siharvecwnadsSaee see adoles os eg 497 
Delayed renewal sequences ........... 0c c eee eens 499 
The Renewal. Theorem edie ni eee ews evee ow va gene AIEEE 502 
f Applications to random walks 2.0.44 4% vse ieee des eee wen vanes 506 
26. Time-homogeneous Markov Sequences ............... 511 
Transition operators and discrete generators .................4.- 511 
Pamper ra a hows ale a EE aes 515 
Martingales and the strong Markov property ................... 520 
Hitting times and return times .........ssunssresueaerra nreno 522 
Renewal theory and Markov sequences .............. 00 ee eee eens 525 
Irreducible Markov sequences ........... 000 cece cect teen eee ee 527 
Equilibrium distributions ...........0.0.. annaa anaana cece eee 528 
27. Exchangeable Sequences .................. 000 cee eee ees 533 
Finite exchangeable sequences ............ 0c cece eee e ee een neeees 533 
Infinite exchangeable sequences ............. 0000 annn eee eens 539 
Posterior distfibutions:.¢0.64+ Sinan iwdcoaee ceie eee Gkisu ao oh eee os 542 


+ Generalization to Borel spaces ............0 0... c cece een eee ee 544 


xii CONTENTS 
27.5. Ferguson distributions and Blackwell-MacQueen urns ......... 550 

Chapter 28. Stationary Sequences .................. ccc cece eee ees 553 
Zoids Demons siessen pad ace Rn ak Sel hoa Sata n C Macbeth eed to tS 553 
2572. NOUAUOW 22rd or etiene re od ohne ay eae Ree oa enantio 555 
28:3; PRA PIES a rr E E ET T oa 556 
28.4. The Birkhoff Ergodic Theorem ............. 0.00000 cece eens 558 
2S Erodes %.ack ihe adie yeh eee tone Ses tients aha twee ca EEA 561 
28.6. t The Kingman-Liggett Subadditive Ergodic Theorem .......... 564 
28.7. t Spectral analysis of stationary sequences ...............0..000. 571 

Part 6. Stochastic Processes 

Chapter 29. Point Processes .................. cece errur orreen 581 
29.1. Point processes as random Radon measures ..................055 581 
29.2; Intensity measures os ardei nentis Sa tagedeN dire d nie ake ewes 586 
29.3. Poisson point processes .........s.ssesressrserrrerrerrererrrerres 587 
29.4. Examples of Poisson point processeS.............s.ceeeee ee eee ees 589 
29.5. ¢ Probability generating functionals.........................000. 594 
29.6. t Operations on point processes ......... 06... cece eee eee eee 597 
29.7. t Convergence in distribution for point processes ................ 598 

Chapter 30. Lévy Processes ................. 0.0 cece eee eee tee eee 601 
30.1. Measurable spaces of right-continuous functions ................ 601 
30.2. Definition of Lévy process ......... 00.0 cece cece eee teenies 602 
30.3. Construction of Lévy processes .......... 00.0 e cece eee ee eens 605 
30.4. Filtrations and stopping times ............... 0. cece eee ee ee eee 610 
305 OUUDONGINAION 222 cancws este tesa ne cnagd tiga gee meen emun eRe 611 
30.6. t Local-time processes and regenerative subsets of [0, o0) ........ 612 
30.7. t Sample function properties of subordinators .................. 618 

Chapter 31. Introduction to Markov Processes .................... 621 
Sih Cadas Spa E es c2e se secntiweetea Rees Ghote Rie aas een tena eRe 621 
31.2. Markov, strong Markov, and Feller processes .................06. 622 
31.3. Infinitesimal generators ........... 0. cece eee eee ees 628 
314. The martingale problem: resit aar yh ocak pea etnies Ses ae ie es 629 


CONTENTS xiii 


31.5. Pure-jump Markov processes: bounded rates .................05. 632 
31.6. Pure-jump Markov processes: unbounded rates ................. 636 
31.7. t Renewal theory for pure-jump Markov processes .............. 639 
Chapter 32. Interacting particle systems ........................... 641 
32.1. Configuration spaces and infinitesimal generators ............... 641 
322... Phe universal coupling cirerer ir nen eek a a A 644 
E DE e E E E EE E E E E 651 
$2.4, Egüilibrium distributions: s a. s.04 e2aescntee eens aaa aaa Es 655 
32.5. Systems with attractive infinitesimal generators ................. 657 
Chapter 33. Diffusions and Stochastic Calculus .................... 661 
33.1. Stochastic difference equations ........... 0.0 eee teen eee 661 
332 Fhe o MiteCraly sds so EN eot eancs hay Lech date ha acts RTR 663 
33.3. Stochastic differentials and the It6 Lemma ...................... 668 
33.4. Autonomous stochastic differential equations ................... 672 
33.5. Generators and the Dirichlet problem .......................... 679 
33.6. Diffusions in higher dimensions ............. 0.6. c cece eee eens 682 


Part 7. Appendices 


Appendix A. Notation and Usage of Terms ........................ 687 
Ale SVMbOls: oare ae en ttae tee eree wh EEE AAE EE 687 
Pie GAGE? hah ota ere te eee e en deat tee Guns aan dans nner hes 690 
A.3. Exercises on subtle distinctions ........... 0.00. eee ees 692 

Appendix B. Metric Spaces ............ 0.0 nes 693 
Ba (DennitiCnn s 14. c10tn.cnaud gud a Betoreeeasew sav bie conse ee 693 
B2 DOG UCN CCS: Ga eterra e oaa ehh eee se 3 sant Cosas AAE yen 694 
Bo: CONUINUOUS TUNCTIONS: 4.jeaca hae A Ea TERETE lew aes 695 
B.4. Important metric spaces ......-.. 0. ccc eee e eect e nent eee eeees 695 

Appendix C. Topological Spaces ...............0. 0. eee 697 
Col ONCE DUS ke sur naw aston nie eae haa ons Win ain Soe T eke eee 697 
C2 “CompactinGation: «cavvrsdussntwsascesee neta trats pnee eres tee ees 699 
1.3: Product topologies ceirean nde daha sa tetdetsiwetnerudews 700 


CA- Relative topology: rrira EE oi alah een eere ee oan ede maw 700 


xiv CONTENTS 


C.5. Limits and continuous functions ................... 00 cee eee eee 701 
Appendix D. Riemann-Stieltjes Integration ........................ 703 
D.1. The Riemann-Stieltjes integral .............. 00... c cee eee 703 
D.2. Relation to the Riemann integral ........... 0.0.00. cece eee eee 705 
jor Change Varia DIOS: 5 n00.avey a ponr soraa wee i Sieh ata ied dae ene 706 
BHA Antestat ion. Dy Parts erena aae easiest Ge wht em AEN 107 
D.5. Improper Riemann-Stieltjes integrals .......... 0. eee eee eee 708 
Appendix E. Taylor Approximations, C-Valued Logarithms ...... 709 
E.1. Some inequalities based on the Taylor formula ................... 709 
E.2. Complex exponentials and logarithms ...................22- eee 711 
E.3. Approximations of general C-valued functions ................... 714 
Appendix F. Bibliography jc240045 sexawe Rig ee ee ed eS 715 
Appendix G. Comments and Credits .......... 0... eee 723 


List of Tables 


5.1 Basic facts about some distributions on Zt 
5.2 Basic facts about some distributions with densities 


8.1 Gamma function, Stirling Formula, and relative error 


13.1 Characteristic functions of some continuous distributions 
13.2 Characteristic functions of some discrete distributions 
13.3 Moment generating functions of some distributions 


60 
61 


114 


212 
213 
221 


Preface 


Overview 


This book is intended as a textbook in probability for graduate students in math- 
ematics and related areas such as statistics, economics, physics, and operations 
research. Probability theory is a ‘difficult’ but productive marriage of mathemat- 
ical abstraction and everyday intuition, and we have attempted to exhibit this 
fact. Thus we may appear at times to be obsessively careful in our presentation 
of the material, but our experience has shown that many students find them- 
selves quite handicapped because they have never properly come to grips with 
the subtleties of the definitions and mathematical structures that form the foun- 
dation of the field. Also, students may find many of the examples and problems 
to be computationally challenging, but it is our belief that one of the fascinat- 
ing aspects of probability theory is its ability to say something concrete about 
the world around us, and we have done our best to coax the student into doing 
explicit calculations, often in the context of apparently elementary models. 

The practical applications of probability theory to various scientific fields 
are far-reaching, and a specialized treatment would be required to do justice 
to the interrelations between probability and any one of these areas. However, 
to give the reader a taste of the possibilities, we have included some examples, 
particularly from the field of statistics, such as order statistics, Dirichlet distri- 
butions, and minimum variance unbiased estimation. We have also given several 
examples of random geometrical structures, involving exact computations where 
possible. And of course, a variety of models such as coin-tossing and urns appear 
repeatedly. 

If little or no material is omitted, the book is suitable for a 3-semester 
sequence. We feel that an incoming graduate student in mathematics could 
begin her or his graduate studies with this book and by the middle of the second 
year be ready to do the reading necessary to begin research in probability theory. 
Later in this preface we give some suggestions for 2-semester sequences. 


PREFACE xvii 


What is ‘modern’ about our approach? Three main features come to mind. 
First, there is our philosophy that random variables are functions that may take 
values in a variety of spaces, not just R or Rf. Second, we have endeavored to 
employ the most up-to-date methodology in constructions and proofs. Third, 
we have included material that is of relatively recent vintage. Here are some of 
the consequences. (i) Random sequences, random functions, random sets, and 
random distributions are all presented within the same framework as R-valued 
random variables. (ii) We accommodate the value oo wherever possible, such as 
in our treatment of nonnegative random variables. (iii) Except in our treatment 
of continuous-time stochastic processes, abstract topology plays little or no role 
in the book. (iv) We minimize the technicalities associated with continuous-time 
stochastic processes by using constructions based on almost sure convergence and 
convergence in probability. (v) Conditional expectations are defined as means 
of random distributions, and as such, naturally inherit properties of (uncondi- 
tional) expectations. (vi) Relatively recent proofs are given of several standard 
results, including the Strong Law of Large Numbers, the Renewal Theorem, the 
De Finetti Theorem, and the representation theorem for Lévy processes. (vii) 
Poisson point processes, interacting particle systems, coupling methods, optimal 
strategies in gambling, and the martingale problem are all examples of important 
newer topics not typically presented in a general introductory graduate textbook. 


Structure of book 


The prerequisite knowledge needed is an understanding of elementary linear al- 
gebra and advanced calculus. Measure theory is introduced as needed mostly 
within the first nine chapters. One of the pleasant aspects of approaching mea- 
sure theory in this manner is that probability nicely motivates many of its ab- 
stract ideas in a way that is not possible in a standard real analysis course. 

At the beginning of the book, the pace is somewhat leisurely as we introduce 
probability spaces, random variables, and some important families of distribu- 
tions. As the book progresses our expectations of the student gradually increase, 
as we endeavor to help her or him prepare to read various specialized books and 
papers. 

There are more than 1200 problems. The asterisks (*) next to about 300 of 
them indicate that some sort of ‘answers’ are available at the Web site 


http://www.birkhauser.com/books/ISBN /0-8176-3807-5 


Such an ‘answer’ may be a complete solution, a solution to part of the problem, 
just a numerical answer, or merely a hint. 

The problems are integrated with the text. Thus a problem coming imme- 
diately after a theorem and its proof is likely to be related to that theorem, but 
often in a way that forces the student to review earlier material. The asterisk is 
attached to a wide variety of problems: easy and hard, theoretical and calcula- 


XVili PREFACE 


tional, specific and open-ended. Readers should choose to do all sorts, including 
many for which an ‘answer’ is not given on the Web. One of the many skills that 
the problems are designed to foster is that of learning to self-check solutions. 

Despite its advantages, there is one significant drawback to interweaving 
problems with the textual material itself. This format tends to make one feel 
that one cannot continue until one has done all relevant problems, especially 
the problems that request a proof of the preceding proposition or theorem. We 
believe this approach to be self-defeating. Often by moving on, one gains a 
perspective that enables one to return successfully to a difficult problem. 

We urge the reader to become familiar with the appendices early so as to 
be aware of what can be found there as needed. 

The latter section or sections of some chapters are listed as optional. This 
practice does not reflect our view of how important certain material is, but it 
does indicate our view that a coherent well-structured body of knowledge is 
contained in the nonoptional sections of any given chapter. 


Organization of a sequence of courses 


It is not required that chapters be studied in the order presented. In particular, 
some may want to study Part 4 before Part 3. The dependency relations among 
the chapters are shown in the chart below. There is one minor exception: the 
proof of Lemma 20 in Chapter 21 relies on material from Section 18.7. 


Those who have had a course in measure-theoretic analysis may find that only 
a light reading of some chapters is necessary, primarily to become familiar with 
terminology used in probability theory. These chapters are 


PREFACE xix 


e Chapter 2: Random Variables 

e Chapter 4: Expectations: Theory 

e Chapter 8: Integration Theory 

e Chapter 20: Spaces of Random Variables 


Also, light reading may be all that is necessary for Sections 2 and 3 of Chapter 
7, although most who have learned an existence proof for measure have probably 
learned one that is different from the one in this book. Finally, Sections 2 and 
3 of Chapter 9 are review for those who have studied product measure and the 
Fubini Theorem, and large portions of Chapter 13 may be review for those who 
have studied Fourier analysis. 

In some cases, the proof of a result is best skipped on a first reading, even 
though an awareness of the result itself is important. Here is a list of proofs 
which might fall into this category: 

e the proof of Theorem 14 of Chapter 5, which characterizes the functions 

that are probability generating functions; 

e the proofs of the lemmas and theorems in Sections 2 and 3 of Chapter 7, 

which are used for proving the extension theorem for measure; 

e the proof of Theorem 13 of Chapter 13, which identifies the class of charac- 

teristic functions of integer-valued random variables; 

e the proof of Lemma 23 of Chapter 14, which is the cornerstone for proving 

that the limiting type of a normalized sequence is unique; 

e the proofs of the lemma and theorem in Section 10 of Chapter 14, the latter 
of which asserts that certain positive definite functions are characteristic 
functions of real-valued random variables; 
the proof of Lemma 21 of Chapter 21, which says that the space of proba- 
bility measures on a Borel space is a Borel space. 


The optional sections at the end of some chapters are indicated by daggers 
(t) or double daggers (t). The symbol t is a warning that this optional section has 
as a prerequisite either a chapter that is not indicated by the dependency chart 
or another optional section. Here is a description of the additional prerequisite 
material needed for such optional sections: 


Section 10.4 requires Section 9.8 
Section 17.4 requires Section 17.3 
Section 19.7 requires Chapter 16 
Section 21.6 requires Chapter 13 
Section 27.5 requires Sections 10.4 and 27.4 
Section 28.7 requires Chapter 19 
Section 29.6 requires Section 29.5 
Section 29.7 requires Section 29.5 
Section 30.6 requires Chapter 25 
Section 30.7 requires Chapter 24 
Section 31.7 requires Section 30.6 


XX PREFACE 


Occasionally some optional section is relevant for a later problem in a nonop- 
tional section. When this happens, the presentation of the problem is such as 
to provide a warning. For instance, when the zeta distribution, treated in an 
optional section, is relevant for a later problem, the term ‘zeta distribution’ ap- 
pears in that problem. Similarly, the ‘Wiener sausage’ problem in Chapter 28 
makes explicit reference to the chapter in which the standard Wiener process is 
introduced, since that chapter is not a prerequisite for Chapter 28. 

Here are the chapter numbers for some possible 2-semester sequences: 

e 1-15, 18-24 

e 1-15, 20-25, 26 or 27 

e 1-15, 18, 20-24, 29 

e 1-17, 20-24 

1 -12, 20-28 
Unless the students entering the course are already comfortable with measure- 
theoretic analysis, one will probably have to omit some optional sections in order 
to complete any of these programs in two semesters. By omitting all optional 
sections of chosen chapters, one may be able to treat more chapters than are 
listed in the above schedules. We recommend against such a plan. We also do 
not recommend covering only a portion of the nonoptional sections of a chapter 
solely for the purpose of being able to touch on other chapters. We think that 
by moving rapidly but lightly one deprives the students of the in-depth training 
they will need in order to handle further reading and problem-solving on their 
own. 


Acknowledgements 


Preliminary versions of this text have been used for several years at the Uni- 
versity of Minnesota and at some other schools. We thank many students and 
faculty for feedback. In particular, when Bisser Roussanov was a student at the 
University of Minnesota, he found many errors of the hard-to-spot variety in an 
early version. 

In many cases, we have not identified a particular source in the literature 
for a theorem or proof, because in our judgment, the result may be viewed 
as somewhat widely known. However, Appendix G does cite sources for some 
proofs and theorems that we feel are not widely known. Some of these sources 
are colleagues at the University of Minnesota who have made conversational 
contributions to this book beyond the specific ones mentioned in Appendix G, 
and for these mathematical communications we are grateful. Finally we thank 
Chuck Newman, Loren Pitt, and Tom Liggett for looking at preliminary versions 
of the text and making several useful comments. 


PART 1 


Probability Spaces, Random Variables, 
and Expectations 


2 PART 1. PROBABILITY SPACES, RANDOM VARIABLES, AND EXPECTATIONS 


A probability space is the mathematical abstraction of an experiment involv- 
ing some randomness, a randomness that may only be in the eye of the observer 
as a consequence of a lack of full knowledge. The basic ingredients that constitute 
a probability space are discussed in Chapter 1. 

‘Random variables’ represent observations or measurements based on the out- 
comes of experiments. Chapter 2 contains basic definitions and examples con- 
cerning random variables, while Chapter 3 introduces a fundamental concept 
that is used to both categorize and analyze random variables, namely the ‘dis- 
tribution’ of a random variable. 

The ‘expectation’ of a real-valued random variable denotes its a priori aver- 
age value. The theory of expectations is developed in Chapter 4. An important 
theorem in probability theory, the Law of Large Numbers, shows that the ex- 
pectation also represents the long term average of what is observed or measured 
when an experiment is repeated many times. Chapter 5 contains the statement 
and proof of the Law of Large Numbers, along with definitions and further results 
about expectations. 

The theoretical foundations of probability theory are to be found in measure 
and integration theory. Quite a bit of measure theory is already contained in 
Chapters 1 through 4, but a more thorough treatment, including a detailed solu- 
tion of certain significant technical problems involving existence and uniqueness, 
is reserved for Chapters 6 and 7. Integration theory, which appears in somewhat 
specialized and slightly disguised form in Chapter 4, is given a general treatment 
in Chapter 8. 


CHAPTER 1 
Probability Spaces 


In modern probability theory, a fundamental building block is the ‘probability 
space’, a concept that is to be precisely defined in the latter portion of this 
chapter. We begin the chapter informally by giving some concrete examples of 
probability spaces. In particular, we model the experiment of tossing a fair coin 
infinitely many times. 


1.1. Introductory examples 


In each example given below, we will define three objects: (i) a set Q, representing 
the possible outcomes of an experiment; (ii) a collection F of subsets of Q; and 
(iii) a function P: F — [0,1]. 


Example 1. To model the experiment of rolling an ordinary die one time, 
we let 2 denote the set of outcomes of this experiment. Ignoring such aspects of 
the experiment as the duration of time it takes for the die to stop rolling or the 
distance it travels, we focus on the six outcomes that correspond to the number 
of dots that show on the top face when the die comes to rest. So, for our model 
we set 


= {1,2,3,4,5,6}. 


We have idealized somewhat by not including as possible outcomes such unusual 
occurrences as the die coming to rest on an edge. 

In some circumstances the word ‘outcome’ might be used with a different 
meaning. For example, a person might say that he or she is hoping that the 
outcome is an even number. In so doing, this person is using ‘outcome’ to denote 
a subset of Q rather than a member of Q. With an appropriate description one 
might refer to any subset of 2. We let F denote the collection of all subsets of 
Q. Three terms are often used in lieu of the usual set-theoretic terminology: Q 
is called the ‘sample space’, the members of Q are called ‘sample points’, and 
the members of F are called ‘events’. If A is an event, we say that A ‘occurs’ if 
the result of the experiment is a sample point that is a member of A. Thus, in 


4 1. PROBABILITY SPACES 


the experiment under discussion, if A = {2,4,6}, then saying that “A occurs” is 
the same as saying that an “even number is rolled”. 

The probability of an event A, denoted by P(A), indicates the likelihood that 
the event occurs. It is usual in the case of a balanced die to define 


LA 
PA) =+, 
where {A indicates the number of members of A. In particular, P(Q) = 1 and 
P(@) = 0, which is consistent with the fact that in our idealized experiment, 
the outcome of the experiment is always a member of Q and never a member of 
Ø. Another way of saying this is that Q is an event which is certain to occur, 
while Ø is an event which is certain to not occur. The function P is called 
a ‘probability measure’ and the number P(A) is called the ‘probability of A’. 
The particular P we have chosen reflects the geometrical symmetry of a die, 
but it should be emphasized that what makes a particular choice of P correct 
is not abstract reasoning but the degree to which P reflects the nature of the 
experiment. In fact, one of the main tasks for statisticians is to devise ways of 
choosing approximately correct probability measures P for various experiments. 
Probabilists, on the other hand, usually take P as given and obtain various 
consequences. 

According to the principles of classical mechanics, one might argue that when- 
ever a die is rolled, there is one sample point that has probability 1; that sample 
point could be determined via a complicated calculation involving the laws of 
physics and an exact knowledge of the initial conditions and the motion of the 
hand rolling the die. Typically, this knowledge and the required calculational 
ability is missing, and it is for this reason that a probabilistic model is appropri- 
ate. 


AEF, 


Example 2. We wish to model the experiment of drawing an object at ran- 
dom from a container that contains several objects. We assume that each of the 
objects is equally likely to be drawn. In probability theory, it is customary to 
call such a container an urn. Suppose that the urn contains 3 green balls and 
5 blue balls. If we have no reason to distinguish the balls any further than by 
color, we might choose 

Q = {green, blue}, 


and let F be the collection of all subsets of Q. There are only 4 events in F: Q, 
0, {green}, and {blue}. We assign these events probabilities 1, 0, 3/8 and 5/8 
respectively. If we had wanted to consider like-colored balls to be different from 
one another, we could have instead chosen a sample space with 8 sample points, 
and then followed the pattern of Example 1 by defining the probability of an 
event A in terms of {A. 


1.1. INTRODUCTORY EXAMPLES 5 


Example 3. Consider the experiment consisting of n successive flips of a 
coin. Then 


Q = {(w1,..., Wn): wi E {heads, tails}, i = 1,...,n}. 


For convenience we may want codes for ‘heads’ and ‘tails’—say, 1 for ‘heads’ and 
0 for ‘tails’. Thus, 


Qe (wins sWn ewe {1,0}. = 1g th 


There are 2” sample points. As in Example 1 we let F be the collection of all 


subsets of Q. Set P 
P(A) = i AEF. 


Suppose 0 < k < n. In order that Sy, wi = k, it is necessary and sufficient 
that exactly k of the w;’s equal 1. There are 


(ae 


sample points that have this property. Hence, 


PU oiin): yu Sikh = ee O<k<n. 


In coin-flipping terminology this is the probability of obtaining exactly k heads 
in n flips of a fair coin. 


Problem 1. In Example 3 how many members does F have? 


Problem 2. For Example 3 calculate the probability of obtaining an even number 
of heads, as a function of n. 


Problem 3. Example 3 really consists of infinitely many examples, one for each n. 
For each n > 2 it is meaningful to speak of the probability that the first two flips 
are both tails. Show that this probability is independent of n. Generalize. 


The next example encompasses the infinitely many examples in Example 3 
within one big example. 


Example 4. Consider the experiment consisting of an infinite sequence of 
coin flips. Then 


PN (Oy Woe pe A Oa S12 ees 


Since Q has infinitely many members, the procedure used in Example 1 and 
Example 3 for defining P cannot be used here. However, we do know what we 


6 1. PROBABILITY SPACES 


want P(A) to equal for certain subsets of Q, in order to have consistency with 
Example 3. For any set A, of k-dimensional vectors of 0’s and 1’s, we want 

_ GAR 
(1.1) P({(w1,we,...): (wi). +5) € Ak}) = SE 
We must wait for precise definitions before completing the description of P and 
considering F. 


* Problem 4. In Example 4 calculate, for j = 1,2,..., the probability that the first 
head occurs on flip number 7. 


1.2. Ingredients of probability spaces 


The first object needed for the definition of ‘probability space’ is a set, often 
denoted by Q. The second object is found in the following definition. 


Definition 1. A o-field of subsets of a set Q is a collection F of subsets of 
Q that has @ as a member and is closed under complementation and countable 
unions. The sets that are members of the o-field are said to be measurable with 
respect to F, or F-measurable, or just measurable if the a-field is understood 
from the context. 


We will only consider probability models in which the collection of events F 
is a o-field. Here are some reasons for that restriction. When we construct a 
probability space, we would ideally want the collection F of events to contain 
all the subsets of Q that might possibly be of interest to us. It seems reasonable 
that a theory that allows us to speak of an event occurring should also allow 
us to speak of it not occurring, so F should be closed under complementation. 
Also, we would like to speak of ‘at least one’ of several things occurring, provided 
that we can speak of them individually occurring, so F should be closed under 
unions. It is not so clear whether ‘several’ should mean ‘finitely many’, ‘countably 
many’, or even more. By requiring that F be a o-field, we have chosen to have 
‘several’ mean ‘countably many’. It is to be understood that ‘countably many’ 
encompasses ‘finitely many’, and that ‘countable’ encompasses ‘finite’. When we 
want to exclude finiteness, we will use a phrase like ‘countably infinite’. 


Problem 5. Show that if F is a o-field of subsets of Q, then Q is a member of F 
and F is closed under countable intersections. Also show that F is closed under 
set differences: if A and B are in F, then so is A \ B. 


A sequence or collection S of sets is said to be pairwise disjoint if AN B = 0 
whenever A and B are distinct terms or members of S. This terminology is used 
in the following definition, which introduces the third and final object needed in 
the definition of ‘probability space’. 


1.2. INGREDIENTS OF PROBABILITY SPACES 7 


Definition 2. A probability measure P on a o-field F of subsets of a set Q is 
a function from F to the unit interval [0,1] such that P(Q) = 1 and 


°(U An) = Pn) 


for each pairwise disjoint sequence (Am: m = 1,2,3,...) of members of F. 
Because P satisfies this summation condition it is said to be countably additive. 
For A € F, P(A) is the probability of A. 


Problem 6. Show that countable additivity encompasses finite additivity: 


P (Ù än) = So P(n) 


for each pairwise disjoint finite sequence (A1,..., An) of members of F. 


Definition 3. A probability space is a triple (Q, F, P), where Q is a set, F 
is a o-field of subsets of Q, and P is a probability measure on F. The set Q is 
called the sample space and its members are called sample points. The members 
of F are called events. 


Given a probability space (0,7, P) we sometimes say that P is a probability 
measure on Q or on (N, F), rather than on F. 


Proposition 4. Let AC B and An, n = 1,2,..., be events in a probability 
space (0,F7,P). Then P(Q) = 0, P(A) < P(B), P(B\ A) = P(B) — P(A), 
P(A‘) = 1—- P(A), and 


(Oa) < Èran 


Problem 7. Prove the preceding proposition. 


Problem 8. Verify that probability spaces have been defined in Example 1, Exam- 
ple 2, and Example 3. 


In most probability spaces, if w is a sample point, the set {w} is an event. 
When it is, we will often not distinguish between the sample point w and the 
event {w}. Thus, for example, we talk of the probability of a sample point when 
we mean the probability of the event consisting of that one sample point. 

For models with countably many sample points it is usual for F to be the 
o-field of all subsets of Q; indeed, this must be the case if each sample point 
constitutes an event. 


8 1. PROBABILITY SPACES 


Problem 9. Let A be an event which contains countably many sample points. As- 
sume that each of the sample points in A is an event. Find a formula for the 
probability of A in terms of the probabilities of the sample points in A. 


* Problem 10. Use the product of the sample space of Example 1 with itself to 
construct a probability space for the experiment of rolling two ordinary dice, one 
of them being red and the other green. Your model should contain 36 equally likely 
sample points. 


* Problem 11. For the probability space of Example 4 show that each sample point 
is an event that has probability 0. What implicit role does this last fact play in 
the command: Flip a fair coin repeatedly until the first tails occurs? 


Example 5. Suppose two identical dice are rolled. One way of constructing 
a sample space for this experiment is to use 21 sample points (w1,w2), 1 < 
Wy < w < 6, with the smaller of the two numbers on the dice equaling wı and 
the larger equaling w2. Consistency with the model constructed in Problem 10 
requires that each of the 6 sample points of the form (w1,w1) have probability 
1/36, and that each of the remaining 15 sample points have probability 1/18. 

There is another way of making the natural connection between the exper- 
iment of Problem 10 and the example under discussion. For the experiment 
of throwing two identical dice, use the same 36 sample points that were used in 
Problem 10 but do not use the o-field of all subsets of Q. Rather, use the smallest 
o-field containing all sets of the forms {(w1,w1)} and {(w1, w2), (we, w1)}. 


1.3. o-fields 


It is often the case that in a probabilistic model, the events of interest form a 
collection £ of subsets of Q that is not a o-field. Since the theory requires that 
we work with o-fields, we replace € in such cases by a larger collection of subsets 
that does form a o-field. 


Definition and Proposition 5. Let € be a collection of subsets of a set 2. 
Then there exists a unique o-field F D E such that if G D E and G is a ø- 
field, then G D F; F is the smallest o-field containing E. It is called the o-field 
generated by E; we write F = o(€). 


PROOF. The uniqueness is clear. To prove the existence define 


Fe iN iG: 


GDE: 
G is a o-field 


We finish the proof by showing that F is a o-field. Clearly, @ € F. Fix AE F 
and let G be any o-field that contains €. Since A € G and G is a ø-field, A° is a 
member of G. Hence A° € F. Next let (A1, Ag,...) be any sequence of members 
of F, and again let G be any o-field that contains €. Then each An is a member 


1.4. BOREL o-FIELDS 9 


of G, and since G is a o-field, (J°-_, An is a member of G. It follows that (JZ; An 
is a member of F. We have shown that F is closed under complementation and 
countable unions, so F is a o-field. O 


In the context of Example 4, let € be the collection of sets of the form 


(1.2) {(wi, w2... ): (w1, -. Wk) = (E€1,. -3 Ek)} 


for positive integers k and vectors (€1,...,€ķ) of 0’s and 1’s. The formula in 
(1.1) defines probabilities P(A) for all sets A € £. By using countable additivity 
in various ways, the reader should be able to determine probabilities for many 
events in o(€). It will be shown in Chapter 7 that probabilities can be determined 
in a unique way for all of the events in o(€). 


Problem 12. In each of the following, a subset of the probability space in Example 4 
is described. In each case, show that this subset is an event (that is, a member of 
o(€)), and calculate its probability. 
(i) The first head comes immediately after an even number of tails. 
(ii) At six flips, but no earlier, the number of heads equals the number of tails. 
(iii) The sequence ‘heads, tails, heads’ occurs before the sequence ‘heads, heads, 
heads’. 
(iv) The sequence ‘heads, heads, tails’ occurs before the sequence ‘tails, heads, 
tails’. 


More generally, consider an arbitrary sample space Q. One may have a family 
E of subsets that are particularly interesting or easy to describe. Then, when 
trying to build a probability space (Q, F, P), it is natural to take F = o(€) and 
to begin constructing P by specifying its values on €. Two questions immediately 
arise. Can the domain of P be extended to F so that it is a probability measure? 
If so is such an extension unique? Both questions will be treated in Chapter 7. 
Here is a counterexample that sheds some light on the uniqueness question. 


(Counter)example 6. Let Q = {a,(@,7,6}, let F denote the o-field of all 
subsets of Q, and let € = {{a,G},{G,y}}. Then o(€) = F. We define two 
distinct probability measures P and Q on F whose restrictions to € are identical: 


define P by P({8}) = P({6}) = 1/2 and Q by Q({a}) = QUY} = 1/2. 
1.4. Borel o-fields 


When constructing a probability space starting with a set Q, it is often the case 
that Q is a topological space (see Appendix C). When this is the case it is 
implicit, unless otherwise stated, that the o-field of interest is the one generated 
by the collection of open sets. This o-field is called the Borel o-field. The 
members of the Borel o-field are called Borel sets, or Borel subsets if the relation 
to the entire topological space is being emphasized. The symbols A, B, and C 
will be reserved for Borel o-fields, but this does not mean that a o-field carrying, 
say, the name F is not a Borel o-field. 


10 1. PROBABILITY SPACES 


In any topological space, all closed sets, being the complements of open sets, 
are Borel sets. Typically there are Borel sets that are neither open nor closed; 
the intersection of countably many open sets is Borel but may be neither open 
nor closed. 

The real line, which will be denoted by R throughout the book, is a topological 
space. In R, the interval [0, 1), for instance, is a Borel set because 

OO 
(0,1) = () 4,1), 
n=1 
a countable intersection of open sets. Indeed all intervals are Borel sets. Sin- 
gletons, being closed, are Borel. Thus countable sets, such as the set of rational 
numbers, are Borel. 


Problem 13. Show that any of the following collections of subsets of R generate 
the Borel o-field of subsets of R: the collection of all closed sets, the collection 
of all open intervals, the collection of all bounded intervals open on the left and 
closed on the right, the collection of all open intervals having left endpoint —oo, the 
collection of all closed intervals, the collection of all open intervals having rational 
endpoints, the collection of all intervals of the form (—co,r] where r is rational, 
and the collection of intervals of the form (7/2, (j + 1)/2"], where j is an integer 
and n is a nonnegative integer. Add a few more similar collections to the list. 


Here is a list of several more topological spaces that are of interest in prob- 
ability theory: (i) the set of nonnegative real numbers, denoted by Rt; (ii) 
d-dimensional Euclidean space, denoted by R?; (iii) the infinite-dimensional 
space R® (see Example 2 of Appendix C); (iv) the extended real line R = 
RU {—00} U {oo}; (v) the nonnegative members of R, denoted by R’. The 
topology in (i) is the relative topology induced by the usual topology on R. The 
topologies in (ii) and (iii) are product topologies. The spaces in (iv) and (v) are 
compactifications of R and R*, respectively, and they receive their topologies 
accordingly, as described in Example 1 of Appendix C. 


* Problem 14. Show that the members of the Borel o-field of Rt with the relative 
topology are those Borel subsets of R that contain only nonnegative numbers. 


Problem 15. Show that the Borel o-field of [0, 1) with the relative topology induced 
by the usual topology on R is generated by the family of those subsets of [0, 1) that 
are open in R. 


* Problem 16. Show that the collection of d-dimensional ‘open boxes’ generates the 
Borel o-field in R?. (An open borz is a set of the form I; x --- x Ig, where each T; 
is an open interval in R.) 


Problem 17. Show that a subset of R is Borel if and only if it is the union of a 

Borel subset of R and one of the four subsets of the set {00,-—oo}. What are the 
at 

Borel subsets of R ? 


__ CHAPTER 2 
Random Variables 


This chapter treats certain functions having as their domain a probability space. 
Such functions, known as ‘random variables’, have the property that they trans- 
form one probability space into another. In applications, random variables often 
represent what is actually observed in an experiment. Thus, in specific examples, 
it may be more descriptive to call them by such names as ‘random numbers’, 
‘random sequences of heads and tails of a coin’, and ‘random chords of a circle’. 


2.1. Definitions and basic results 


Many of the functions studied in probability theory are R-valued. But functions 
that take values in other spaces are also of interest, the only restriction being 
that these spaces satisfy the following definition. 


Definition 1. A measurable space is a pair (¥, G), where V is a nonempty 
set and G is a o-field of subsets of WV. 


Note that if (Q, F, P) is a probability space, then (Q, F) is a measurable space. 


Definition 2. Let (Q, F) and (¥,G) be measurable spaces. A measurable 
function from (N, F) to (W,G) is a function X: Q > W such that ¥71(B) € F 
for every B € G. When a probability measure P is attached to the measurable 
space (Q, F), so that (Q, F, P) is a probability space, X is also called a random 
variable from (Q, F, P) to (¥,G). 


The language of the preceding definition is often shortened. For instance, if 
the o-fields F and G and the probability measure P are understood from context, 
we may say that X is a random variable from Q to W, or simply that X is a 
W-valued random variable. 

The next two propositions indicate useful methods for deciding whether a 
function from one measurable space to another is measurable. 


12 2. RANDOM VARIABLES 


Proposition 3. Let X be a function from a measurable space (Q,F) to a 
measurable space (V,G). Suppose E is a family of subsets of V that generates G 
and that X—1(B) € F for every BE E. Then X is a measurable function. 


Problem 1. Prove the preceding proposition. Hint: Consider the collection H of 
all subsets B of Y for which X~'(B) € F and show that H is a o-field. 


Proposition 4. Every continuous function from one topological space to an- 
other (or to itself) is measurable, where the relevant o-fields are the Borel o- 
fields. 


* Problem 2. Prove the preceding proposition. 


We have already seen that the domain of a random variable X is a measurable 
space that has been fitted with a probability measure P. As the following result 
shows, X transfers P to a probability measure Q on the target of X in a natural 
way. This ‘induced’ probability measure Q is of central importance in probability 
theory. 


Definition and Proposition 5. Let X be a random variable from a prob- 
ability space (N, F, P) to a measurable space (W,G). For B € G let Q(B) = 
P(X71(B)). Then (¥,G,Q) is a probability space. The probability measure Q 
is called the distribution of the random variable X and is said to be induced by 
X (or by X from P). 


PROOF. Since X71(W) =, Q(W) = 1. Let Bı, Bo,... be a pairwise disjoint 
sequence of members of G. Then X~!(Bmn)M X7'(Bn) = @ whenever m # n. 


a(x) =r (Ox) 


Note that every distribution is a probability measure, and that every probabil- 
ity measure is a distribution (induced by the identity function), so ‘distribution’ 
and ‘probability measure’ are essentially synonymous terms that tend to be used 
in somewhat different contexts. 


2.1. DEFINITIONS AND BASIC RESULTS 13 


* Problem 3. Suppose that X and Y are random variables defined on the same 
probability space. Assume that the set 


{v : X(w) #Y(w)} 


is an event having probability 0. Prove that the distributions of X and Y are equal. 


The preceding problem describes one of many situations in which an event 
having probability 0 can be ignored. Events having probability 0 are called null 
events. The union of countably many null events, being a null event itself, can 
also often be ignored. But a word of caution: One cannot ignore an uncountable 
union of null events—and the temptation to do so comes in many disguises. An 
uncountable union of null events may have positive probability, may indeed be 
the entire probability space, or may not even be an event. 

Random variables X and Y for which the set {w: X(w) Æ Y(w)} is a null 
event are said to be equal almost surely. The expression ‘almost surely’ is often 
abbreviated a.s. We say that two events A and B are equal a.s. if both of the 
events A \ B and B \ A are null events. 


Problem 4. Prove that two events that are equal a.s. have the same probability. 


Problem 5. Show that events A and B in a probability space (Q, F, P) are equal 
a.s. if and only if 
P(AN B) = P(A) V P(B). 


The property of being equal a.s. is an equivalence relation, whether applied 
to random variables or events. All random variables in an equivalence class 
have the same distribution and all events in an equivalence class have the same 
probability. (The converse is not true. Events can have the same probability 
even though they are not almost surely equal, and random variables can have the 
same distribution even though they are not almost surely equal.) It is sometimes 
more convenient to consider equivalence classes rather than individual random 
variables or events. 

Consider a random variable X on a probability space (Q, F, P) with values in 
the measurable space (Y, G). By Definition 5, X induces a probability measure Q 
on the measurable space (W,G) thereby transforming it into a probability space 
(v,G,Q), a probability space on which one can contemplate defining random 
variables. These observations point the way to the next two propositions. 


Proposition 6. Let (Q, F), (¥,G), and (©, H) be measurable spaces, and let 
X: Q —> Y andY: Y — O be measurable functions. Then Y o X is a measurable 
function. 


Problem 6. Prove the preceding proposition. 


14 2. RANDOM VARIABLES 


Proposition 7. Let X be a random variable from a probability space (N, F, P) 
to a measurable space (U,G), and let Y be a measurable function from (¥,G) to a 
measurable space (O,H). Let Q denote the probability measure on (V,G) induced 
by X. Then Y is a random variable on the probability space (V,G,Q) with the 
same distribution as that of the random variable Y o X defined on the probability 
space (N), F, P). 


Problem 7. Prove the preceding proposition. 


2.2. R¢-valued random variables 


Focusing first on R! -valued random variables, we start with some random vari- 
ables that are easy to describe but, nevertheless, play an important role for 
probability calculations and theory. 


Definition 8. Let (Q, F) be a measurable space. For each subset A of Q, 
define a function I4 on 2 by 


1 ifweA 
Ij(w) = 
alw) ‘ otherwise. 


The function I, is called the indicator function of A. A finite linear combination 
of indicator functions is called a simple function. 


Subsets C}, j E€ J, of a set Y form a partition of W if: (i) Uses C; = Y, (ii) 
each C; # 0, and (iii) Cj NC, = @ whenever j # k. A partition is finite if the 
corresponding set J is finite, and it is countable if J is countable. 


Lemma 9. Let X: Y > R be a simple function: 


AS Sale 7 
i=1 


Let G be ao-field of subsets of V such that A; E€ G for each i. Then there exists 
a unique positive integer n, a unique partition {Cj:1 < j < n} of Y, and unique 
real numbers cj, 1 <j < n, such that 


(2.1) ney Bie 
j=l 


and c; # ck whenever j Ak. Moreover, each C; E€ G. 


PROOF. The image of X consists of finitely many values, the only possibilities 
being 0 and sums of one or more of the a;’s. Call the members of the image 
C1,C2,..-,€n, and, for 7 = 1,2,...,n, set 


(2.2) Ci = {p: X (4) = cj}. 


2.2. R4- VALUED RANDOM VARIABLES 15 


It is clear that (2.1) holds and that {C}: 1 < j < n} is a partition of Y. On the 
other hand, if (2.1) holds, then (2.2) holds because the numbers c; are distinct. 


For each 7, 
ac) 


k 
o- u Ba)( 
CRRRTSE Qiy HetHaip =C; r=1 
Corollary 10. Let (¥,G) be a measurable space. If f: V — R is a simple 
function that is a finite linear combination of indicator functions of members of 
G, then f is a measurable function into (R, H) for any choice of the o-field H. 


>: 


fi 


‘= 
iHip,1<r<k 


Therefore, C; E€ G. O 


Problem 8. Prove the preceding corollary. 


We may call the indicator function of an event an indicator random variable, 
and, in view of the preceding corollary, a finite linear combination of indicator 
random variables a simple random variable. 

We now discuss various examples of random variables that are not simple. 


Example 1. Let Q be the sample space for an infinite sequence of coin flips, 
as described in Example 4 of Chapter 1. Let F be the o-field generated by sets of 
the form (1.2) and let P denote the unique probability measure on F satisfying 
(1.1). (We will prove the existence and uniqueness of such a P in Chapter 7.) 
Let X denote the function from (Q, F, P) to (R, B) defined by 


X ({w1, We, pane )) = wiw ---two , 


where the symbol on the right represents a member of the interval [0,1] written 
in binary notation. Let us show that X is a random variable and calculate its 
distribution. 

We consider the set X~1((j2~”, (j + 1)27”]), where 7 and n are nonnegative 
integers. If j27” > 1, then X~1((j2~”, (j + 1)27”]) = @ and is thus a member 
of F. In the remaining cases 7 satisfies 0 < j < 2”. Let j = €1€2..-Entwo be 
the binary representation of j (written with possibly some superfluous leading 
zeroes in order to have n binary digits). Then 


32 = 0.€1 ... £7,000. . wo 
and (j + 1)27” =0.€,...€n111.. two - 


Therefore, it is almost true that 
(2.3) KHG (J F 1)27”]) E {(w1, we, ve me Wi = €i for 1 < i < n} , 


which is a member of F. The ‘almost’ arises for two reasons. First, the set on 
the right contains an extra point, namely w = (£€1,---,En,0,0,...). And second, 


16 2. RANDOM VARIABLES 


the set on the left contains an extra point because of the fact that there are 
two different binary representations of (j + 1)2~”. By Problem 11 of Chapter 1, 
individual sample points constitute measurable sets, so these differences between 
the two sets do not affect the measurability of the set on the left side. That X is 
a random variable now follows from Problem 13 of Chapter 1 and Proposition 3 
of this chapter. 

The preceding argument that X is a random variable also gives much infor- 
mation about its distribution Q. Since the P-probability of the event on the 
right side of (2.3) is 27” and the P-probability of the extra sample points on the 
two sides equals 0, 

Q((j2-", (J +1)27")) = 2°”. 
By taking finite disjoint unions and using the additivity of P, it is easily shown 
that 

Q((i2-", j2-"]) = G ~ 2” 
for nonnegative integers i,j n such that i < j < 2”. Also, since individual 
sample points have probability 0, one may add or delete the endpoints of these 
intervals without changing their probabilities. Thus, for all intervals Z C [0,1] 
with binary rational endpoints, Q(/) equals the length of J. It will develop, from 
the theory in Chapter 7, that these values of Q determine the function Q on the 
Borel sets B and that, in particular, the Q-probability of any subinterval of [0, 1] 
equals the length of that interval. 

The probability measure Q is called Lebesgue measure on [0,1] and can be 
regarded as a generalization of the notion of length to sets that are not intervals. 
In probabilistic language, Q is called the uniform distribution on [0,1] and is 
likely the distribution someone has in mind when discussing, without clarifica- 
tion, the experiment of choosing a number at random from the unit interval. In 
fact, successive coin tosses constitute an effective but cumbersome way of ap- 
proximately choosing such a random number. For instance, if the first four flips 
are ‘heads, tails, tails, heads’, then we conclude, successively, that the random 
number is a member of [3, 1], of [5, 3], of [5, 3], and then of [%, 3]. 

The probability space ([0, 1], B, Q) arises often in probability theory. In this 
space, all countable sets are null events, so there is no essential change in the 
probability space if a countable subset of [0,1] is ignored. Thus, one also uses 
phrases such as “the probability space consisting of Lebesgue measure on the 
interval (0, 1]” and “the uniform distribution on (0, 1)”. 


The term ‘random number’ was used in the preceding example to refer to an R- 
valued random variable with a uniform distribution. In general, the term random 
number is synonymous with the term ‘R-valued random variable’, regardless of 
its distribution. 

The first problem in the following set is useful for proving that certain R- 
valued functions are random variables. The next five problems are relevant for 
R? -valued random variables, and the one following that gives an example of an 


2.2. R4¢-VALUED RANDOM VARIABLES 17 


R¢-valued random variable for arbitrary d. The final problem of the section 
indicates some methods of obtaining new random variables from old ones. 


* Problem 9. Show that every increasing function from R to R is measurable. 


Problem 10. Let (Q, F, P) denote the usual probability space for an infinite se- 
quence of coin flips. Let X be the function from Q to (R?, B) defined by 


X (lwi, we, seit )) = (0.w1w3 ++ two 3 Q.wow4 a dwa) : 


Do enough calculations and reasoning to become convinced that X is a random 
variable and that the distribution of X is a generalization of area, restricted to 
the unit square [0,1]*. This distribution is called Lebesgue measure or uniform 
distribution on [0,1]*. Some people would prefer to call the random variable X of 
this example a random vector. 


Problem 11. Prove that the uniform distribution on the unit square assigns 0 prob- 
ability to the boundary of the square. How may this fact be used when defining 
the uniform distribution on [0,1)?, (0, 1]?, or (0,1)?? 


* Problem 12. For the random variable X = (X1, X2) of Problem 10, calculate the 
probability that Xı V X2 > 2/3; that is, calculate 


P({w: Xi(w) V Xo(w) > 2/3}). 


Problem 13. Let (Q, F, P) be the probability space of Problem 10 of Chapter 1 
for the rolling of two dice. Set 


X (wi, w2) = (w1 A w2,w1 V w2). 
Relate this random vector to Example 5 of Chapter 1. 


* Problem 14. Let Q be the unit square, B the Borel o-field of subsets of Q, and P 
the uniform distribution on 2. Let X be the random variable on (Q, F, P) defined 
for w = (wi,w2) by 

X(w) = (wi A w2, w1 V w2) ` 


Prove that X is a random variable and describe its distribution. 


Problem 15. Let Q consist of all infinite sequences of 0’s and 1’s. Let F be the 
o-field generated by sets of the form (1.2) and let P be the unique probability 
measure on F satisfying (1.1). Fix a positive integer d and let Y denote the set of 
all d-dimensional vectors of 0’s and 1’s. Let G denote the o-field consisting of all 
subsets of Y. Define the function X from Q to ¥ by the formula 


X((w1,we,...)) = (wiew): 


Prove that X is a random variable and that its distribution is the probability 
measure introduced in Example 3 of Chapter 1. 


18 2. RANDOM VARIABLES 


Problem 16. Let Xj,...,Xq be R-valued functions defined on the same space. 
Show that the vector (Xi,..., Xa) is an R?-valued measurable function if and only 
if each X; is measurable, and deduce that the following are R-valued measurable 
functions: the sum, product, max, and min of X1,..., Xa. Hint: Proposition 4 and 
Proposition 6 will be useful. 


Notice that no knowledge of the underlying probability space is needed in 
order to do Problem 12. The only necessary information is the distribution of 
X. Since this situation is quite common, language like the following is often used: 
“Let X be a random variable uniformly distributed on [0, 1]?.” The existence of 
an appropriate underlying probability space is implicit in such a statement. 


2.3. R® -valued random variables 


The following problem, which involves the product topology of a countably in- 
finite number of factors, is a continuation of the last problem of the preceding 
section. 


Problem 17. Let X1, X2, X3,... be R-valued functions defined on the same space. 
Show that the R” -valued function X = (X1, Xe, X3,...) is measurable if and only 
if each X; is measurable. 


An R” -valued random variable X = (X1, X2,X3,.-..) is also called a random 
sequence. 

It is easy to see that one can replace R and R” in Problem 17 by RandR, 
respectively. That is, there is no essential difference between an infinite sequence 
of R- or R -valued measurable functions defined on the same measurable space 
and an R” - or (R )°°-valued measurable function. The remainder of this section 
will be devoted to R`- and (R` )°°-valued measurable functions. Of course, 
our main concern is with measurable functions that also happen to be random 
variables, but for these particular results, the presence of a probability measure 
plays no essential role. 

When we write 


lim X, = X 


TL OO 


we mean 
lim X,(w) = X (w) for al w. 
n— CO 


Proposition 11. Let (Q,F) be a measurable space, and let (X1, X2,...) be 
an increasing sequence of measurable R-valued functions defined on (Q,F). Then 
X =limy+o Xn is measurable. 


Proor. By Proposition 3, it is enough to show that X~'((a,0o]) is a mea- 
surable set for all real numbers a, since the collection of intervals of the form 


2.3. R°-VALUED RANDOM VARIABLES 19 


(a, co] generates the Borel field of R (see Problem 13 of Chapter 1). Since X is 
the increasing pointwise limit of the sequence (X1, Xo,...), 


(0 0} 
X-H((a,00]) = LJ X52((a,00)). 
n=l 
Each of the sets in the union is measurable since each of the functions Xn is 
measurable. It follows that the union is measurable. O 


Corollary 12. Let (X1, X2,...) be a sequence of R-valued measurable func- 
tions defined on a measurable space (Q, F). Then the following functions are 
measurable: 


sup X,, infX,, limsupX,, liminf X,. 
n n n— o0 n—>+o0 


Furthermore, if 
lim Xn 


n— oo 


exists, it is a measurable function. 


Problem 18. Prove the preceding corollary. Hint: The supremum of any sequence 
of real numbers is the increasing pointwise limit of another appropriately defined 
sequence. 


The following lemma is useful for proving facts about R' -valued measurable 
functions, since it shows that all such functions can be seen as monotonically 
increasing limits of measurable simple functions. See Theorem 15 of Chapter 4 
for an example of such an application. 


Lemma 13. An R` -valued function X defined on a measurable space (N, F) 
is measurable if and only if there exists a sequence (X1, X2,...) of measurable 
simple functions defined on (N), F) such that O < Xı < Xo <..., and 


lim Xn =X. 


n— CO 


PROOF. The ‘if’ portion of the lemma is contained in Proposition 11. For the 
‘only if’ part, suppose that X is a measurable R’ -valued function. For k,n> 1, 
let 


Ap n = {w: (k —1)/2" < X(w) < k/2"} and Bn = {w: X(w) > n} 


and 
1 n2” 
Xn = nlp, + 55 So (k= 1an - 
k=1 


Since X is a measurable function, all of the sets Bn and Ak n are measurable. 
It follows that each Xn is a nonnegative measurable simple function. It is easily 
checked that Xp Z X asn ^œ. O 


20 2. RANDOM VARIABLES 


2.4. Further examples 


In this section, we look at examples of random variables that are not Rt- or 
(Rt )¢-valued for any d = 1,2,...,00. We look first at examples of random 
variables whose values are continuous functions. 

Let Cla, b] denote the space of continuous real-valued functions defined on 
the interval [a,b]. We regard C[a, }] as a metric space (see Appendix B), with 
the distance between two functions being the maximum of the absolute value of 
their difference. It therefore makes sense to talk about B, the o-field of Borel 
subsets of Cfa, b}. 


Example 2. Let (Q, F, P) denote the usual probability space for an infinite 
sequence of coin flips. For each positive integer k we will define a random variable 
Xt) = (x), t € R?) from (0,7,P) to (C[0,1],B). For each k and each 
w = (Ww 1,W2,...) we must specify the values X H (5) of the continuous function 
t ~ X w) at each t € [0,1]. We first specify these values for t equal to a 
multiple of 1/k: for j = 0,1,...,k, let 


Xo) = 4 (Sen - D) 


i=l 


For all other values of t, the value of X E) Cw) is determined by linear interpola- 
tion. The graphs of X‘*)(w) are shown in Figures 2.1, 2.2, and 2.3, for 


w = (1,1,0,1,0,0,0,1,0,0,1,1,1,1,0,1,...) 


and k = 4, 9, and 16. 

For any subset B of C[0, 1], [X]~1(B) is a finite union of sets of the form 
(1.2). Hence X‘*) is a random variable; but many would prefer to use one of 
the terms random function or stochastic process to emphasize that the target of 
X*) is a set of functions. The term ‘stochastic process’ is especially prevalent 
when t represents time. 

In the context of coin-tossing, the sum in the expression for X*)(w) equals 
the difference between the number of heads and the number of tails after 7 tosses. 
The quantity Vk serves to give a common scale to the random functions X*), 
k= 1,2,.... We can be more precise about this scaling factor after we introduce 
‘standard deviation’ in Chapter 5 (see Problem 6 of that chapter). 

Let Qk denote the distribution induced on (C[0,1],B) by X. For each 
k, (C[0, 1],B,Qx) is a probability space. The sample space and o-field are the 
same for the various k but the probability measures are different. We can use 
the calculation done in Example 3 of Chapter 1 to conclude, for instance, that 


(,/2) 2-* if k is even 


Qk ({g: 91) = 0}) = 


0 otherwise. 


2.4. FURTHER EXAMPLES 21 


x) 
1 
1/2 
0 t 
1/4 1/2 3/4 1 
-1/2 
FIGURE 2.1. The graph of X4)((1,1,0,1,...)) 
x9) 


FIGURE 2.2. The graph of X)((1,1,0,1,0,0,0,1,0,...)) 


1/2 


1/4 1/2 3/4 1 
-1/2 


FIGURE 2.3. The graph of X ‘9 ((1,1,0,1,0,0,0,1,0,0,1,1,1,1,0,1,...)) 


22 2. RANDOM VARIABLES 


* Problem 19. In the context of the preceding example, calculate 


Qe({g: 9(1/2) = 0 and g(1) = Vk/2}). 


If one were interested, say, in the probability space (C(0, 1], B, Q6), one could 
study it without ever introducing the random variables X, or the underlying 
coin-flip probability space (Q, F, P). Indeed, before the 1930’s, much probability 
and statistics were done without any systematic use of random variables. The 
preceding example does, however, indicate two good reasons for using random 
variables. First, it is easier to think in terms of sequences of coin flips for 
the construction of the probability space, rather than in terms of randomly 
choosing a function of a certain type. Second, defining several different random 
variables on the same probability space makes it possible to use one underlying 
experiment to study several probability measures simultaneously. For example, 
in Chapter 19, we will be interested in calculating the limit as k - oo of Qk, and 
only one underlying probability space, namely the space (Q, F, P) of Example 2, 
will be needed. 

We have seen so far that the target of a random variable can consist of num- 
bers, vectors, sequences, or functions, in which cases one may use terms that are 
more specific than ‘random variable’, such as ‘random number’, ‘random vector’, 
‘random sequence’, and ‘random function’. The forthcoming Example 3 treats 
a random compact set. In preparation we construct a metric space whose mem- 
bers are compact sets. Let ¥ be a metric space with metric p, and let Y be the 
collection of compact subsets of X. For members U and V of W, set 


d(U, V) =} [max{min{p(u,v): ue U}: ve V} 
+ max{min{p(u, v): vE V}: ue U}]. 


Problem 20. Prove that the function d is a metric on Y. This metric is called the 
Hausdorff metric on Y. 


* Problem 21. Calculate the three Hausdorff distances between the various pairs of 
the following subsets of the Euclidean plane: 


{(£z1, 22): —1 < Tı < 0, |x| < 1}, 
{(z1, £2): 0 < Tı < 1, £2 = 0}, 
{(z1, £2): Tti = —1, |x2| < 1} š 


In the following example the random variables have values that are compact 
subsets of the Euclidean plane. In fact, the values are line segments, so we will 
work with a subspace of the metric space treated in the preceding problem. Of 
course, the metric for the subspace has the same formula as the metric for the 
whole space. 


2.4. FURTHER EXAMPLES 23 


Example 3. Let us consider the experiment of choosing at random a chord 
of the unit circle in R?. Assume that the circle is given parametrically by 
(cos 2rt, sin 2zt),O<t< 1. 

Here is one possible interpretation of ‘at random’. Let Q be the unit square 
[0,1)?. Let A denote the Borel o-field of subsets of Q and let P be the uniform 
distribution on Q (see Problem 11). Let Y be the space of chords of the given 
circle and let B denote the Borel o-field of subsets of Y, with Y being regarded as 
a subspace of the metric space of all compact subsets of R? (with the Hausdorff 
metric). For w = (a, 8) € Q, let Xı (w) denote the line segment having endpoints 
(cos 27a, sin 2ra) and (cos27f,sin27f). Notice that X,(w) is a chord of the 
circle of interest for any choice of w, provided we allow for the degenerate case 
of a chord being a single point. 

Here are two more interpretations of ‘at random’, using the same spaces 
(Q, A, P) and (W,B). Let Xə be the chord that both passes through the point 
whose polar coordinates are (2a — 1, t8) and is perpendicular to the line segment 
from the origin to that point. In case a = 1/2, this line segment consists of a 
single point, but we can still define its direction to be n8. Let X3(w) be the 
chord which has (cos 27a, sin 2ra) for one endpoint and whose angle measured 
counterclockwise from the positive horizontal direction is 7. 

A fourth interpretation can be obtained by letting Q denote the interior of the 
circle in question. Let P denote the uniform distribution on Q (meaning?) and 
let A be the Borel o-field of subsets of 2. Let X4(w) denote the chord whose 
midpoint is w. 


Example 4. Let us look more closely at the first interpretation in the pre- 
ceding example of the experiment of choosing a random chord of the unit circle. 
We first ask whether the function X, defined in that example is a measurable 
function from (Q, F, P) to the space (W,B). We leave it to the reader to prove 
that a sequence of chords in VY converges in the Hausdorff metric if and only if 
the endpoints of these chords converge in R?. This fact makes it natural for us to 
include the degenerate case of chords of length 0 in order to make V a complete 
metric space. (However, this is purely a matter of taste. There is nothing that 
compels us to complete Y.) Our description of convergence in YW should make 
it clear that X, is a continuous function from Q to Y, so X, is measurable and 
may be legitimately called a ‘random set’. 

Let us calculate the probability of the event that X, intersects both the pos- 
itive vertical axis and the negative horizontal axis. For definiteness, we exclude 
the origin from these two sets. As illustrated in Figure 2.4, this event equals the 
union of 


A= {w= (a,8): > Sa<janda-7<8<j} 


and the reflection of A about the line a = 8 in R?. These two sets are disjoint, 
and they each have area 1/32, so the probability is 1/16. 


24 2. RANDOM VARIABLES 


0 1/2 1 a 


FIGURE 2.4. Random chord: four sample points 


* Problem 22. Repeat the work done in the preceding example for each of the other 
three interpretations found in Example 3. That is, in each case, decide whether 
one-point subsets of the circle are to be considered chords, prove the necessary 
measurability, and compute the probability of the event that the random chord 
intersects the positive vertical and negative horizontal axes. Also comment on and 
fix an ambiguity in the definition of X4. 


CHAPTER 3 
Distribution Functions 


The main purpose of this chapter is to classify all probability measures on the 
measurable space (R,B). We will accomplish this task by establishing a one- 
to-one correspondence between such probability measures and a certain class of 
functions, known as ‘distribution functions’. Many important probability mea- 
sures and their corresponding distribution functions will be identified, including 
the binomial, normal, Poisson, gamma, and beta families of probability mea- 
sures. 


3.1. Basic theory 
We introduce the class of functions to which the preceding paragraph refers. 


Definition 1. A real-valued function F defined on R is called a distribution 
function for R if it is increasing and right-continuous and satisfies 


lim F(z)=0 and lim F(x) =1. 
TtT—>— CO ra co 
Often the phrase ‘for R’ will be omitted. 


Let Q denote a probability measure on the measurable space (R, B). We want 
to show that the function z ~ Q((—co, z]) is a distribution function. For this 
purpose we need the following useful result. 


Theorem 2. [Continuity of Measure] Let (0,7,P) be a probability space, 
and let (A;,A2,...) be a sequence of events in F. If Ay C Ap C..., then 


P(U An = lim P(An). 
n=l 


If Ay D Ag D..., then 


26 3. DISTRIBUTION FUNCTIONS 


PROOF. To prove the first assertion we suppose that A; C Ay C.... Let 
Bı = A; and Bm = Am \ Am-1 for m = 2,3,.... Note that 


n 


Ay = Cee and Gere UJ Bn. 
m=1 m=i 


m=} 


Since the Bm’s are disjoint, we can use the countable additivity of P to make 
the following calculation: 


This proves the first assertion of the theorem. 
Now suppose that A; D Ag D.... Then Af C A$ C .... Therefore, we can 
apply the first part of the theorem to the sequence (Af, AS,...) to obtain the 


following: 
r( Fa.) -1-»(Ga) 
n=1 n=l 


=1-— lim P(AÑ) 
n—> oo 
=]- lim (1— P(A,)| = lim P(A,). O 


Proposition 3. Let Q be a probability measure on (R, B). Then the function 
F(z) = Q((—ov, z]) is a distribution function. 


PROOF. Note that 
T < Y= (—00, x] C (—oo, y] =m Q((—oo, z}) < Q((—00, y]) A 


Hence, F is an increasing function. To prove right continuity, fix a real number 
x and let (%1,2%2,...) be a decreasing sequence which converges to x. Since 
(—00, 41] D (—00, z2] D ..., Theorem 2 implies that 


Jism, Q((-00,9)) = Q( N (-00, 2n]) 


n=l 


= Q((-o0, z}) : 


Thus, F is right-continuous. The same reasoning, with £n N —0oo, shows that 
F has the desired behavior at —oo. For the behavior at oo, let £n Z oo and use 


3.1. BASIC THEORY 27 


the Continuity of Measure Theorem again: 


(0,0) 


Aos) > (UCs) =0=1. o 


n=1 


If a distribution Q and a distribution function F are related as in the previous 
theorem, then we will call F the distribution function of Q. If X is a random 
variable with distribution Q, then we will also call F the distribution function 
of X. 


Problem 1. Show that the distribution function of the uniform distribution on (0, 1] 


is given by 
0 ifr<0 
F(z)=<4z if0<24<1 
1 ifl<ge. 


What is the distribution function for the uniform distribution on (0,1)? 


Problem 2. Give a precise description of the probability measure Q on (R, B) that 
has distribution function F given by 


oe f eco 


1 ifg>0. 


Also find a probability space (0,7, P) and a real-valued random variable X de- 
fined on (Q, F, P) that has distribution function F. The probability measure Q is 
sometimes called the unit point mass or the delta distribution at 0. 


The preceding exercise is an example of the converse of Proposition 3: For 
each distribution function F there exists a unique probability Q on (R, B) such 
that Q((—oo, z]) = F(x). This fact is included in the next theorem, which also 
provides a recipe for constructing a random variable with distribution function 
F. From a logical point of view, this result belongs in Chapter 7, since the 
necessary tools for proving the existence and uniqueness of probability measures 
are to be found there. However, if the reader is willing to accept uniqueness in 
general and the existence of the uniform distribution on (0,1) in particular, then 
the characterization of all other distributions on (R, B) can be derived using the 
concepts already introduced. 


Proposition 4. Let F be a distribution function. Then there exists a unique 
probability measure Q on (R,B) such that Q((—o0,2z]) = F(x). Moreover, a 
random variable X with distribution function F can be constructed as follows: 
Let Q = (0,1), let P be the uniform distribution on N, and define 


(3.1) X(w) =inf{z: F(z) >w}, O<w<l. 


28 3. DISTRIBUTION FUNCTIONS 


PROOF. The uniqueness will follow from Theorem 3 of Chapter 7. We as- 
sume the existence of the uniform distribution P on (0,1), which follows from 
Theorem 14 of Chapter 7. Further details about the existence of the uniform 
distribution are given in Example 1 of the same chapter. 

By Problem 9 of Chapter 2, X is a random variable. Let Q be the distribution 
of X. We will complete the proof of this theorem by showing that F is the 
distribution function of Q, or in other words, that 


F(y) = P({w: X(w) < y}) 


for all y € R. Since X is an increasing function on (0,1), the event A = 
{w: X(w) < y}) is an interval with endpoints 0 and sup A. Under the uni- 
form distribution, the probability of any sub-interval of (0,1) is the length of 
that sub-interval, so we want to show that F(y) = sup A. 

The definition of X and the right continuity of F imply that F(X(w)) > w; 
so, if w € A, then F(y) > F(X(w)) > w. Hence, F(y) is an upper bound 
of A. On the other hand, F(y) € A, because X(F(y)) < y. It follows that 
F(y)=supA. O 


The relationship between F and X given in the preceding result is most easily 
understood when F is strictly increasing and continuous. In this case, X and F 
are inverse functions of each other and X is strictly increasing and continuous. 
In general X, defined by (3.1), is left-continuous. Jumps of F correspond to 
intervals of constancy of X, and bounded intervals of constancy of F correspond 
to jumps of X. Unbounded intervals of constancy of F correspond to finite 
limits of X at 0 and 1. It has become quite common to refer to X as the ‘left- 
continuous inverse’ of F and F as the ‘right-continuous inverse’ of X, even in 
the cases where there are jumps or intervals of constancy. Figure 3.1 shows an F 
that has both jumps and intervals of constancy. The corresponding X is shown 
below the graph of F, with its domain pictured vertically. 


* Problem 3. Prove that X as defined in the preceding theorem is left-continuous 
and satsifies X(w) = sup{x: F(x) < w}. 


Problem 4. Discuss the options of making either X right-continuous or F left- 
continuous or both in Proposition 4. 


Problem 5. Let X be an R-valued random variable with distribution function F. 
Prove that for all real numbers a < 8, 


P({w: X(w) € (a, b]}) = F(b) — F(a). 


Find analogous formulas involving intervals of the form (a,b), [a, b), and [a,b]. For 
z E€ R, show that 

P({w: X(w) = x}) = F(z) — F(x—). 
As a consequence, conclude that if F is continuous, then the events {w: X(w) = z} 
are all null events. 


3.2. EXAMPLES OF DISTRIBUTIONS 29 


X (w) 


FIGURE 3.1. Distribution function and corresponding random variable 


3.2. Examples of distributions 


In the remainder of this chapter, we will illustrate Proposition 3 and Proposi- 
tion 4 by introducing, through examples and exercises, some of the more impor- 
tant distributions on (R, B). 


Problem 6. [Delta distributions] If X is equal to a constant a, show that the dis- 
tribution function of X is given by 


The distribution of X is called the delta distribution or unit point mass at a. Often 
da will be used for the delta distribution at a. 


Problem 7. [Bernoulli distributions] Fix p € [0, 1], and let X be a random variable 
that equals 1 with probability p and equals 0 with probability 1 — p. Calculate 
the distribution function F' of X. Conversely, starting with F, construct a random 
variable whose distribution function is F. Hint: See Figure 3.2. 


30 3. DISTRIBUTION FUNCTIONS 


1 X (w) 


FIGURE 3.2. Bernoulli random variable defined on (0, 1) 


* Problem 8. [Cauchy distribution] For x € R let 


Tos 1 4 arctan x 
2 T 

and let Q be the corresponding distribution. It is easy to check that F is a con- 
tinuous distribution function. Calculate a random variable X with distribution 
function F by using (3.1). For an arbitrary interval [a,b) write Q([a,b)) in the 
form f? f for an appropriate f. Do the same for intervals of the form [a,b], (a,b), 
and (a,b]. In any of these, is it permissible to use a = —oo or b = 00? At the end 
of Example 1 of Chapter 2, a method was described for approximately choosing a 
random number according to the uniform distribution by using a sequence of coin 
flips. Describe how to transform that method, using the construction in Propo- 
sition 4, to obtain a procedure for approximately choosing a Cauchy-distributed 
random number. 


Problem 9. Define X on the probability space ((0,1), B, P), where P denotes 
Lebesgue measure, by 
X (w) = V5 tan (rw — z) i 


Find the distribution function of X. 
Problem 10. For the coin-flip probability space of Example 4 of Chapter 1, let 
X ((w1,we,...)) =inf{i: wi = 0} 
if w; = 0 for some i, and let 
X((1,1,1,1,...)) = 23. 
Prove that X is a random variable and calculate its distribution function. 


Problem 11. [Geometric distributions] Fix p € [0,1) and let X be a random vari- 
able which, for each nonnegative integer x, equals z with probability (1 — p)p”, 
where 0° is understood to equal 1. Calculate the distribution function F of X. 
Conversely, starting with F, use the methods of this chapter to construct a ran- 
dom variable whose distribution function is F. 


3.3. SOME DESCRIPTIVE TERMINOLOGY 31 


3.3. Some descriptive terminology 


In this section we introduce some definitions that are useful in comparing and 
describing different distributions. 


Definition 5. Let X and Y be R-valued random variables. Then Y is of the 
same type as X if Y has the same distribution as aX + b for some constants 
a € (0,00) and b € R. It is of the same strict type if b can be chosen equal to 0. 


We apply the phrase of the same type to distributions and distribution func- 
tions of random variables as well as to the random variables themselves. Thus, 
two distributions are of the same type if they are the distributions of random 
variables of the same type. 


* Problem 12. Explain why two distribution functions F; and F, are of the same 
type if and only if 


(3.2) F(x) = Fi((x — b)/a) 


for some constants a € (0,00) and b € R and all x. Also, explain why replacing 
b by 0 gives necessary and sufficient conditions for the distributions to be of the 
same strict type. 


Problem 13. Prove that the relation of being of the same type is an equivalence 
relation. 


Problem 14. Show that the distribution function that is the answer to Problem 10 
is of the same type as a distribution function described in Problem 11. 


It is clear that the delta distributions are all of the same type. These distribu- 
tions are said to be of degenerate type. The delta distributions constitute three 
strict types: those at negative points, those at positive points, and the single 
delta distribution at 0. 


Proposition 6. Let Y be an R-valued random variable of nondegenerate type, 
and let a > 0 and b be constants. If Y has the same distribution as aY +b, then 
a=1andb=0. 


PROOF. Let F denote the distribution function of Y and let (¥,G, R) denote 
the probability space on which Y is defined. Define X, Q, and P as in Proposi- 
tion 4. By that proposition, the distribution function of X is F and thus X has 
the same distribution as Y. For any real number y, 


R({Y € ¥: aY (4) +b < v) = R({Y € T: Y) < P) 


= P({w Eù: Xw) < ¥—}) 
= P({w E Q: aX(w)+b< y}). 


32 3. DISTRIBUTION FUNCTIONS 


Thus aX + b has the same distribution as aY + b, and, therefore, X and aX +b 
have the same distribution. 

Since X is increasing and left-continuous and since a > 0, aX + b is also 
increasing and left-continuous. The construction in Proposition 4 shows that 
the distribution functions of X and aX + 6 are their ‘right-continuous inverses’. 
Since they both have the same distribution function, they have the same ‘right- 
continuous inverse’. It follows that they are the same function. Since their 
common distribution is nondegenerate, they are not constant functions, so there 
exist two members w; and wə of (0,1) such that X (w1) # X(w2). The constants 
a and b must satisfy 


aX (wi) +b = X (w1) 
aX (w2) +b = X (w2). 


This can be regarded as a system of two linear equations in the unknowns a and 
b. Since X(w,) # X (w2), the determinant of the coefficient matrix is nonzero, 
so there exists a unique solution, which is obviously a = 1,b= 0. O 


Problem 15. Show that if Y is of degenerate type, then there exist constants b Æ 0 
and a > 0 such that Y has the same distribution as aY + b. 


Problem 16. Show by example that Proposition 6 is false if we allow a < 0. 


Problem 17. Let Y be a R-valued random variable whose distribution is not the 
delta distribution at 0. Add whatever is necessary to Proposition 6 to prove that 
if a > 0 and Y has the same distribution as aY, then a = 1. 


Remark 1. We have introduced several named distributions or families of 
distributions: uniform, delta, Cauchy, Bernoulli, and geometric. We will be in- 
troducing other families of distributions later in this chapter and in other chap- 
ters. There are times when it is convenient to include in a family of distributions 
all those distributions that are of the same type as any member of the family. For 
example, the distribution of Problem 10 is often called a geometric distribution. 
This ambiguity will not usually cause any confusion. For example, when we use 
a phrase like “the family of Cauchy distributions” it is clear that we mean all 
distributions of the same type as the distribution introduced in Problem 8. If 
we want to be more precise, we can use language like “the standard geometric 
distribution with parameter p” when we wish to refer specifically to one of the 
distributions introduced in Problem 11. A phrase like “distributions of geomet- 
ric type” can be used when we want to make it clear that we are speaking of 
all those distributions that are of the same type as the ones in Problem 11. We 
warn the reader, however, that the word ‘standard’ does not have a generally 
accepted precise meaning in this context, and we will not try to give it one here. 
The term ‘family’ is also imprecise. For instance, the geometric distributions of 


3.3. SOME DESCRIPTIVE TERMINOLOGY 33 


Problem 11 could all be said to belong to the same family, but those for different 
p are not of the same type. On the other hand, the “family of uniform distribu- 
tion functions” would usually denote the set of distribution functions that are of 
the same type as the distribution function z ~ (z V0) A1. 


Definition 7. Let (NQ, F, P) be a probability space such that Q is a topolog- 
ical space and F is the corresponding Borel o-field. If there exists a closed set 
C C Q such that 

(i) P(C) = 1 and 

(ii) P(C’) < 1 for all closed proper subsets C” of C, 
then C is called the support of P. If the support of P is contained in a set D, 
then we say that P is supported by D. In the case that (Q, F) = (R, 8), the 
support of P is also called the support of the distribution function corresponding 
to P. 


It can be shown that the support of a probability measure P exists if Q is 
a sufficiently nice topological space. The following result shows that if Q is 
the real line, the support exists and can be identified explicitly in terms of the 
distribution function. 


Proposition 8. If P is a probability measure on (R, B) with distribution func- 
tion F, then the support of P is the set 


C = {xz E€ R: F(x +e)-— F(z — e) > 0 for alle > 0}. 


PROOF. We first show that C is a closed set. Choose a point y ¢ C. By the 
definition of C, there exists an € > 0 such that F(y +€) < F(y — €). It follows 
from the fact that F is increasing that F is constant on the interval [y — €, y +€]. 
Thus, by the definition of C, (y — e,y +£) C C°. We have shown that every 
point in C° has an open neighborhood that also lies in C°, so C is closed. 

We have also shown that every point y ¢ C is the midpoint of an open 
interval on which F is constant. For each y ¢ C, let Jy be the maximal such 
interval. (Clearly, Jy exists since arbitrary unions of open intervals centered at 
y are themselves open intervals centered at y.) By Problem 5, P(J,) = 0 for all 
y ¢ C. It is easily seen that 

Ore A dis 


y¢C,y rational 


so C is a countable union of null events. Thus P(C) = 1. 

It remains to show that if C’ is a closed proper subset of C, then P(C’) < 1. 
Let C” be a closed proper subset of C, and let x be a point in C \ C’. Since C’ 
is closed, there exists an £ > 0 such that [z — e,z +e] C (C’)°. By the definition 
of C, F(a@+e)— F(z — e€) > 0. By Problem 5, P([z —¢,x+6]) > 0, from which 
it follows immediately that P(C’) <1. O 


34 


3. DISTRIBUTION FUNCTIONS 


Problem 18. Prove that if the support of a probability measure P exists, then it 
is unique. 


Problem 19. Find the uniform distribution function whose support is [c,d], where 
—oo<c<d<o. 


Problem 20. Show that all Cauchy distributions have the same support. 


Problem 21. Let X be an R-valued random variable, a a positive constant, and ba 
real constant. Set Y = aX +b. Let c and d denote the infimum and the supremum, 
respectively, of the support of the distribution of X. Prove that ac + b and ad + b 
are the infimum and supremum, respectively, of the support of the distribution of 
Y. Make sure the proof encompasses the possibilities c = —oo and d = oo. 


Problem 22. For any R-valued random variable X describe the support of the 
distribution of X? in terms of the support of the distribution of X. Hint: Be 
careful. 


Problem 23. An R-valued random variable X and its distribution are said to be 
symmetric about a point b € R if X — b and b — X have the same distribution. 
The distribution of such a random variable X is also said to be symmetric about b. 
Reformulate this definition solely in terms of distribution functions. Also show that 
the standard Cauchy distribution is symmetric about 0. What other distributions 
introduced so far are symmetric about some point? 


Problem 24. Let X be a real-valued random variable, and suppose that 6 is a real 
number such that both the quantities 


P({w: X(w) < b}) and P({w: X(w) > b}) 


are less than or equal to 1/2. Then b is called a median of the distribution of X. 
Show that every distribution has a least one median, and find some distributions 
with more than one median, as well as distributions with exactly one median. Show 
that if a distribution is symmetric about b, then b is a median of that distribution. 
Is it possible for a distribution that is symmetric about b to have a median that is 
not equal to b? 


Problem 25. Describe a graphical procedure for finding all the medians of a dis- 
tribution, given the graph of its distribution function. 


Problem 26. Show that if the support of a distribution is the entire real line, then 
the distribution has exactly one median. 


Problem 27. Which of the families uniform, delta, Bernoulli, Cauchy, and geomet- 
ric, contain distributions of different types? Which of them contain more than one 
member having the same support? 


3.4. DISTRIBUTIONS WITH DENSITIES 35 


3.4. Distributions with densities 
Some important distribution functions can be represented as integrals. 


Definition 9. Let F be a distribution function that can be represented in 
the form 


F(z) =f f(t) dt. 


Then f is a density of F, and also of the corresponding probability measure of 
any corresponding random variable. 


Whenever a density exists for a distribution function F, then F has infinitely 
many densities. For instance, the value of f can be changed at finitely many 
points without changing F. In spite of this fact, we will sometimes loosely speak 
of ‘the’ density of F. In many important cases, a distribution function F will 
have a density that is continuous on the support of F and vanishes elsewhere. 
There can be at most one such density, and when there is one, the phrase ‘the 
density’ usually refers to it. More will be said about the concept of density in 
Chapter 8. 


* Problem 28. Find the densities of all Cauchy and uniform distributions. 


Problem 29. Show that if f is a nonnegative function that is Riemann-integrable 
on every closed bounded interval in R and if 


f toai, 


then f is the density of some distribution. 


Problem 30. [Exponential distributions] Let a be a positive real parameter. Show 


that 
ae °? ifx>0 
f(z) = es 
0 ifx<0 


is a density. Let X be a random variable with density f. Calculate the probability 
that X belongs to the interval [2,3]. Also, calculate the median of the distribution 
of X. 


Problem 31. Let X have the distribution of the preceding problem. For n = 
1,2,..., let Yn equal |X|/n. Prove that Y, is a random variable, and calculate 
and identify its distribution function. 


Problem 32. Let X be a random variable with density f. Let a > 0 and b be 
constants. Show that z ~~ +f((x — b)/a) is a density of the random variable 
aX +b. Formulate a theorem about densities of the same type. 


* Problem 33. Let X be a random variable with density f. Show that X? has a 
density g, and find a formula for g in terms of f. 


36 3. DISTRIBUTION FUNCTIONS 


Example 1. [Normal or Gaussian distributions] We show that 


—zr? 
fa) = gge 


is a density. Obviously it is nonnegative. The square of its integral over R is 


1 o0 2 3 CO 2 5 1 (0,9) oo 5 A j 
et! dz | e7! /? dy = =j J e7 (= +y )/2 dx dy 
0O — oo 


Qn aes ae 27 


1 oo 27 3 
=> f J eT rdr =1. 
o Jo 


(Since the integrand is nonnegative and continuous, the replacement of an it- 
erated integral by a double integral and the change of variables in the double 
integral are justified by results from advanced calculus. The validity of these 
steps is also a consequence of Theorem 10 and Theorem 15, both of Chapter 9.) 
Thus, f is a density. A simple change of variables shows that densities for other 
distributions of the same type can be written in the form 


1 e7 (2-b) /20? 
V2ra? ` 


Example 2. [Gamma distributions] For y a positive real parameter, the 
gamma function T is defined by 


ry) = f u’te—" du, 
0 


which is easily seen to be a convergent improper integral. It follows that the 


function 
arte e" 


f(z) = I'(7) 
0 ifz <0 


ifx>0 


is a density for all a > 0. When the parameter y equals 1, a gamma distribution 
is an exponential distribution. 


* Problem 34. Prove the following four facts about the gamma function T: 
(G) T(iy+1)=%7P(y) fory>0. 
(ii) T(y) =(y-1)! for y=1,2,.... 
ili 

vr(2y— 1)! 


a a, OE _135 
((2y — 1)/2)! 227-1 Na 5g ess 


r(y) = 
(iv) 


r(a)r(s) = Ta +8) | c*1(1—2)?"'dr fora, >Q. 
0 


3.5. FURTHER EXAMPLES 37 


Problem 35. Let Y denote a random variable having some gamma distribution 
as described in Example 2 with a = 1. Find the density of the random variable 
exp o(—Y). 


Example 3. [Beta distributions] Fix parameters a, 3 > 0. From the last fact 
in Problem 34 we see that 
T(a + B)z2-1(1 — xP} 
f(x) = r(a)r (8) 


0 otherwise 


fO<a<l 


is a density, called the beta density. The beta distribution with parameters a = 1 
and 8 = 1 is a uniform distribution. 


Problem 36. [Arcsin distribution] Calculate the distribution function of the beta 
distribution having parameters a = @ = 1/2. Sketch the graph of its density; the 
distribution function itself is shown in Figure 3.3. 


F(x) 


FIGURE 3.3. Arcsin distribution function: F(z) = 2 arcsin yz 


3.5. Further examples 


A variety of distribution functions will be introduced in this section, including 
one (Problem 42) that does not have a density even though it is continuous. 
The distributions described in the next three problems play important roles in 
probability theory. 


38 3. DISTRIBUTION FUNCTIONS 


Problem 37. [Poisson distributions] Let A € (0,00). Prove that there exists a 
distribution function that has a jump of size 
Ven 


x! 


at each nonnegative integer z. 


Problem 38. The distribution obtained in Example 3 of Chapter 1 of the number 
of heads in n flips of a fair coin is called a ‘binomial distribution’. Sketch the 
distribution function for the case n = 4. 


Problem 39. [Binomial distributions] Fix a positive integer n and a number p € 
(0,1). Let q = 1 — p. Prove that there exists a distribution function that, for each 
integer x satisfying 0 < x < n, has a jump at z of size 


n £ N—T 
eu 
T 


* Problem 40. Calculate and name the distribution function of — log o[X/b], where 
X is a random number uniformly distributed on (0, b]. 


Example 4. As in Example 3 of Chapter 2, let © be the space of chords 
of the unit circle in R*, and let B denote the Borel field of subsets of Y. Let 
Y: YW > R be the function that assigns to each chord in W its length. The reader 
may check that Y is continuous, and hence measurable. 

In Chapter 2 four different interpretations of the experiment of choosing a 
random chord were given by defining four W-valued random variables X;,1 = 
1,2,3,4. Each of these random variables induces a distribution Q; on (W, 8B). 
By Proposition 7 of Chapter 2, the distribution of Y as a random variable on 
the probability space (Y, B, Q;) is the same as the distribution of Y o X; on the 
underlying probability space, the members of which are ordered pairs (a, 8). 

In this example, we compute the distribution function Gi of Y o X,. A little 
trigonometry gives 

Y o Xi (w) = 2|sina(@ — a). 


Clearly Gı (y) = 0 for y < 0 and Gi(y) = 1 for y > 2. For y € (0,2), {w: Yo 
Xı(w) < y} consists of a strip centered on the line of slope 1 through the origin, 
together with two right triangles having right angles at (1,0) and (0,1) and 
having heights (measured from the hypotenuse) equal to half the width of the 
strip, as illustrated by the shaded region in Figure 3.4. The area of this three- 
part region is 2!/? multiplied by the width of the strip. The width of the strip 
is 21/2 multiplied by the value of 8 at the intersection of the upper edge of the 
strip with the axis a = 0. Setting a = 0 and solving for ĝ in the expression for 
Y o Xj, we find that this intersection occurs at 8 = [arcsin(y/2)|/a. Thus, 


Gi(y) = Ž arcsin # forO<y <2. 


3.6. DISTRIBUTION FUNCTIONS FOR THE EXTENDED REAL LINE 39 


FIGURE 3.4. {w: Y o X (w) <y} 


* Problem 41. For Y as defined in the preceding example, calculate the distribution 
function of Y o X; for i = 2,3, 4. 


Problem 42. [Cantor distribution] Consider the coin-toss probability space of Ex- 
ample 4 of Chapter 1, and let 


X ((wi, we, eee )) = 0.€1€2€3.. -three ; 


where each e; = 2w; and the subscript ‘three’ indicates that the expression is to 
be regarded as a base-three numeral. Sketch a graph of the distribution function 
F of X. Prove that F is continuous. Prove that there exists a Borel set A C [0,1] 
with Lebesgue measure equal to 1, such that for all z € A, F’ (x) = 0. 


3.6. Distribution functions for the extended real line 


The value 23 used in Problem 10 is artificial. A natural value to use for the first 

time that the coin comes up tails is oo in case all of the infinitely many flips 

come up heads. Indeed, it is usual to define the infimum of the empty set in R 

to equal oo. Thus, in this example it is natural to consider a random variable 
= =+ 

that is R- or R -valued. 


Definition 10. An R-valued function F defined on R is a distribution func- 
tion for R if it is increasing and right-continuous and 0 < F(z) < 1 for every 
z ER. 


If Q is a distribution on R, then the distribution function of Q is the function 
x ~ Q([-0o,2]). A theory for distribution functions for R can be given that 
parallels that for distribution functions for R. Since the similarities with the 
theory for R are so strong, we omit the details and only comment that if F is the 
distribution function of a distribution Q on R and if F(—0oo), F (o0) are the limits, 
respectively, of F at —oo, oo, then F(—oo) = Q({—00}) and 1- F (œ) = Q({oo}). 


40 3. DISTRIBUTION FUNCTIONS 


Example 5. For the fair-coin probability space of Example 4 of Chapter 1, 
let 
N(w) = inf{n: wp = Way. = +> = Won = 1}. 
That is, N is the first time at which there begins a sequence of heads longer than 
the number of flips up to and including the beginning of that sequence; N = oo 
if there is no such time. Let us show that N = oo with positive probability. 
Clearly, 


P({w: N(w) =n}) < PU{w: wy = Wags = ++ = Won = 1}) = 20. 


Hence, with F denoting the distribution function of N, 


[0 0) o0 
Fos Ple N@)=n}) < 2 E 
n=l n=1 
By proving that P({w: N(w) = oo}) > 0, we have shown that oo is in the 
support of N. However, it should be noted that oo can be in the support of a 
random variable X even if P({w: X(w) = œ}) = 0. For an example, take any 
geometrically distributed random variable with parameter p > 0, regarded as an 


R -valued random variable. 


CHAPTER 4 
Expectations: Theory 


The ‘expectation’ of an R-valued random variable is a weighted average of the 
values taken by that random variable. It is a useful tool for the description and 
analysis of random variables and their distributions. Properties of expectations 
treated in this chapter include linearity and an important convergence theorem. 
The calculation of expectations is facilitated by establishing a connection with 
Riemann-Stieltjes integration. 


4.1. Definitions 


The expectation is first defined for simple random variables. 


Definition 1. Let X be a random variable defined on a probability space 
(9, F, P) and having the form 


(4.1) X= y cle 
j=1 


for some distinct real constants c; and events C; that constitute a partition of 
Q. Then the expectation of X equals 


(4.2) $ cj P(C5) 
j=l 


and is denoted by E(X). 


By definition, every random variable of the form (4.1) is simple, and by 
Lemma 9 of Chapter 2, every simple random variable can be written uniquely 
in the form (4.1). 


Problem 1. Part of the preceding sentence assures us that Definition 1 is not am- 
biguous. Explain. 


42 4. EXPECTATIONS: THEORY 


Problem 2. Prove that if X is a simple random variable, then E(X) can be written 
in terms of the distribution function F of X: 


E(X) = X z[F(2) - F(z-)], 


rER 


the summation being meaningful since there are only finitely many nonzero terms. 


The set of simple random variables on a probability space is a real vector space 
since it is closed under multiplication by a real number and under addition. We 
may think of E as a function defined on that vector space. Since the elements of 
the vector space are themselves functions, it is customary to call FE an operator 
rather than a function. Thus, E may be called the expectation operator, with 
domain equal to the set of simple random variables defined on any particular 
probability space. We will see in Lemma 3 that E is linear. The following 
lemma is the first step towards the proof of Lemma 3. 


Lemma 2. Let X = De cjIc, be a simple random variable and suppose 
that (Cj: 1 < j < n) is a finite sequence of pairwise disjoint events whose union 
is Q. Then 


E(X) = S oP): 
j=l 


Problem 3. Prove the preceding lemma. (Notice that it is not assumed that the 
real constants c; are distinct, nor is it assumed that the events C; are nonempty.) 


In some vector spaces there is a natural concept of positiveness. For instance, 
in R?, any ordered pair, other than (0,0), that has two nonnegative coordinates 
might be called positive. It is required that the set of positive members of a vector 
space be closed under both multiplication by positive scalars and addition. For 
the vector space of simple random variables on a probability space, the positive 
members are those simple random variables, other than the zero function, whose 
values are nonnegative. An operator on a vector space is said to be positive if it 
maps all positive members of the vector space into Rt. 


Lemma 3. The expectation operator E on the vector space of simple random 
variables on some probability space is both linear and positive. 


PROOF. Consider an arbitrary simple random variable X and a real number 
b. By Lemma 9 of Chapter 2, X can be represented in the form (4.1), and so 


n 


bX = X (be;)Io;, - 


j=l 


By Lemma 2, E(bX) = 07_, (be;)P(Cj) = bE(X). 


4.1. DEFINITIONS 43 


To complete the proof of linearity of E, consider two simple random variables 


n P 
x= X clo, and Y = X diỌIp; , 


j=1 wl 
where both {C}: 1 < j < n} and {Dx: 1 < k < p} are partitions of the under- 
lying probability space Q. Then (Cj; N Dy: 1 <j <n, 1< k< p) consists of 
pairwise disjoint events whose union is 2 and 


n P 
X +Y = KG, + dy )Ioj;nDy : 


j=l k=1 


By Lemma 2 


E(X+Y)= 5 S (g + d,)P(Cj N Dk) 


P 


z (> 3 c;P(C) N Da) + (>: $3 dP(C} N Ds) 


k=1 j=1 


= (> P(C) t 3 d P(Di)) 
E 


as desired. 

Turning to positivity, we suppose that X is given by (4.1) and that its image 
consists only of nonnegative numbers. Then each cj in (4.1) is nonnegative and, 
hence, by (4.2), E(X) > 0. Therefore, E is a positive operator. O 


Problem 4. Suppose that 


X = 3 aila, 
:=1 


for some integer m, some events A;, and some real constants a;. Prove that 


m 


E(X) = X aP(A:). 


1=1 


(Notice that there is no assumption that {A;: 1 < i < m} is a partition of the 
underlying probability space.) 


Lemma 4. If X and Y are two simple R-valued random variables for which 
X <Y a.s., then E(Y) < E(Z). If X =Y a.s., then E(X) = E(Y). 


Problem 5. Prove the preceding lemma. 


44 4. EXPECTATIONS: THEORY 


Expectations are called by other names as well: expected value and mean are 
two common synonyms for ‘expectation’. 


Example 1. The mean of a simple binomially distributed random variable 
(defined in Problem 39 of Chapter 3) is, by Definition 1, 


easy -ge EG u Dm H” j 


Problem 6. Comment on the presence of the adjective ‘simple’ in the preceding 
example. 


* Problem 7. Calculate E(X?) for a simple binomially distributed random variable 
X. 


* Problem 8. Calculate the expected value of the outcome of a single roll of a fair 
six-sided die. 


The next definition extends the concept of expectation to all R” random 
variables. 


Definition 5. The ezpectation E(X) of an R' -valued random variable X 
defined on a probability space (0, F, P) equals supz E(Z) where the supremum 
is taken over all nonnegative simple random variables Z on (Q, F, P) that satisfy 
Z< X. 


We notice several things about the preceding definition. It applies when X is 
a nonnegative simple random variable; it had better agree with the previously 
given definition in this case. It is conceivable that a random variable can have 
expectation equal to oo. The zero random variable necessarily qualifies as one 
of the random variables Z over which the supremum is to be taken, so E as thus 
extended satisfies E(X) > 0 for nonnegative random variables X. Lemma 6 
below also shows that the extended version of £ is still linear. 


* Problem 9. For X a nonnegative simple random variable prove that E(X) as de- 
fined in Definition 5 equals E(X) as defined in Definition 1. 


* Problem 10. Let X be an R’-valued random variable, and let A = {w: X(w) = 
oo}. Show that if P(A) > 0, then E(X) = oo. Find an example for which 
P(A) = 0, but the expectation of X is still infinite. 


4.1. DEFINITIONS 45 


Lemma 6. Let X andY be R` -valued random variables defined on a common 
probability space, and let a and b be constants belonging to R`. Then, 


E(aX + bY) =aE(X)+bE(Y), 


whether finite or infinite, provided that 0- co and œ -0 are understood to equal 


0. 
* Problem 11. Prove the preceding lemma. 


Lemma 7. Let X andY be R’ -valued random variables defined on a common 
probability space. If X < Y a.s., then E(X) < E(Y). If X = Y a.s., then 
E(X) = E(Y). 


Problem 12. Prove the preceding lemma, making sure to encompass the cases of 
infinite expectation. Hint: Use Lemma 4. 


In preparation for extending the concept of expectation to an R-valued random 
variable X, we introduce two R -valued random variables related to X. For each 
w, set 

Xt(w) =OVX(w) and X` (w)=0v[|-X(w). 
The function X* is called the positive part of X, and X7 is called its negative 
part. 


Problem 13. For X an R-valued random variable, prove that X* and X7 as just 
defined are R’-valued random variables for which X = Xt — X` and |X| = 
Xt 4X-. 


Since X+ and X- are both R” -valued random variables, E(X*) and E(X7~) 
are both defined. We will define the expectation of X to be the difference of 
these two values, when the difference is meaningful. 


Definition 8. Let X be a random variable taking values in R. If E(X*) and 
E(X7—) are not both infinite, the expectation of X is given by 


E(X) = E(Xt) -— E(X7). 
Otherwise, the expectation of X does not exist. 


Since X- = 0 for a Ñ -valued random variable X, it is obvious that this 
new definition of E(X) agrees with the previous one for such X. The following 
exercise asks the reader to prove that no ambiguity arises from the new definition 
of E(X) when X is a simple (not necessarily R*-valued) random variable. 


46 4. EXPECTATIONS: THEORY 


* Problem 14. For X a simple random variable, show that E(X) as defined by Def- 
inition 1 equals E(X) as defined by Definition 8. 


A glance at Definition 1, which is needed to make sense of Definition 5 and 
Definition 8, shows that the definition of expectation involves the underlying 
probability measure P. Thus, we have been somewhat imprecise in speaking 
of ‘the’ expectation operator; changing the underlying probability space changes 
the expectation operator. This lack of precision is justified in those circumstances 
when the underlying probability space is clear from the context. And, as we will 
soon see, the expectation of a random variable is determined by its distribution, 
so when the distribution of a random variable X is known, knowledge of the 
underlying probability space is not necessary for the calculation of E(X). 

Nevertheless, it is sometimes necessary to distinguish expectation operators 
defined for different probability spaces. (See Theorem 15 for an example.) In 
such cases, the expectation operator on the space of R-valued random variables 
with underlying probability space (Q, F, P) is denoted by Ep and is called the 
expectation operator with respect to P. The quantity Ep(X) is called the expec- 
tation of X with respect to P. 


4,2. Linearity and positivity 


In this section, we prove linearity and other basic properties of the expectation 
operator. Preliminary versions of some of these properties have appeared in 
Lemma 3, Lemma 4, Lemma 6, and Lemma 7. 

The following convention, already used in Lemma 6, will simplify our discus- 
sion. We will adhere to it throughout the book. 


CONVENTION. Unless explicitly stated otherwise, the products 0-00 and co-0 
will be interpreted to equal 0. 


Theorem 9. Let X and Y be R-valued random variables defined on a prob- 
ability space (Q,F,P), and let a, b, and c be real constants. 

(i) If aX(w) + bY (w) is defined for all w, then E(aX + bY) = aE(X) + 
bE(Y), provided the expression on the right is meaningful. 

(ui) If X =c a.s., then E(X) =c. 

(iti) If X =Y a.s., then either the expectations of X and Y both exist 
and are equal, or neither exists. 

(iv) If X < Y a.s. and either E(X) exists and is different from —oo or 
E(Y) exists and is different from oo, then the other of E(Y) and E(X) 
exists and E(X) < E(Y). 

(v) If E(X) = E(Y) is finite and X < Y a.s., then X =Y a.s. 

(vi) If E(X) exists, then |E(X)| < E(| X|). 

(vii) If E(X) does not exist, then E(| X|) = œ. 

(viii) If X(w) + Y (w) is defined for all w, then E(|X +Y |) < E(|X|) + 
E(|Y |). 


4.2. LINEARITY AND POSITIVITY 47 


PARTIAL PROOF. We first prove a special case of (i) —namely that 

(4.3) E(X+Y)=E(X)+ E(Y) 

whenever the expression on the right is meaningful. We use the following identity: 

(X+Y) +Y 4X7 =(X4+Y)° 4+ YT+x™. 
On each side we have a sum of three R' -valued random variables. By Lemma 6 
E((X +Y)t)+ E(Y-)+ E(X7) = E(X +Y)7)+ E(¥*) 4+ E(X7). 

If all terms are finite, they can be rearranged to give the desired conclusion: 

BX YP) BOY (i BO) a) ce) 


Consideration of the cases involving some infinite terms is left for the reader. 

We also leave it to the reader to use the definition of E(X) in conjunction 
with Lemma 6 to prove that E(aX) = aE(X) for all real a, provided that E(X) 
exists. Assertion (i) follows from this fact and (4.3). 

We leave it to the reader to prove (iv). Then (iii) follows from two applications 
of (iv), one using (iv) as it stands and the other using (iv) with X and Y 
interchanged. Since the expected value of a constant random variable is that 
constant, (ii) as a special case of (iii). The contrapositive of (vii) is a consequence 
of (iv) and the inequality X < |X|. Thus, it remains for us to prove (v), (vi), 
and (viii). 

Assertion (viii) follows from the inequality |X + Y| < |X| + IY], (iv), and the 
consequence E(|X|+|Y]|) = E(X) + EY |) of (i). 

Suppose that E(X) exists. By (i), E(—X) = —E(X) also exists. Then two 
applications of (iv)—one to X and |X| and the other to —X and |X| —gives 
(vi). 

To prove (v) we suppose that E(X) = E(Y) is finite and X < Y a.s.. The 
expected value of the nonnegative random variable (Y — X) exists, so that (i) may 
be applied to the equality Y = X + (Y — X) to give E(Y) = E(X)+ E(Y —X) 
from which it follows that E(Y — X) = 0. Thus, for any e€ > 0, the simple 
function that equals 0 when Y — X < € and equals £ when Y — X > e has 0 
expectation. Therefore, 


P({w: Y(w) — X(w) > e}) = 0. 


Let £ N 0 through a sequence and use the Continuity of Measure Theorem to 
obtain P({w: Y(w) — X(w) > 0}) = 0, as desired. O 


Problem 15. Complete the proof of the preceding theorem. 


48 4, EXPECTATIONS: THEORY 


For a random variable X, let [X] = {Y: Y = X as.}. Thus, [X] is the 
equivalence class of X mentioned in Chapter 2. In view of property (iii) in the 
preceding theorem, we may, with no ambiguity, define E([X]) = E(X), with the 
understanding that the left side is defined if and only if the right side is defined. 
Therefore, E becomes an operator on equivalence classes. 

From (v) of the preceding theorem we see that E({Y]) > 0 if Y > 0 and 
[Y] A [0]. A positive operator on a vector space (in which certain members have 
been identified as positive) is said to be strictly positive if the value it assigns 
to each positive member of the vector space is positive (not just nonnegative). 
In the following corollary we use the notation |[X]|; doing so is legitimate since 
[|X |] = [|Y |] whenever [X] = [Y]. 


Corollary 10. The equivalence classes |X] of random variables X on a prob- 
ability space (N, F, P) for which E(|[X]|) < œ constitute a vector space on which 
the expectation operator is linear and strictly positive. 


Problem 16. Prove the preceding corollary. 


In practice one often does not use an equivalence class notation such as [X], 
even when one is taking an equivalence-class point of view. 

Suppose that X is a random variable defined on some sample space 2, and 
that Y is a function that is defined at some but not necessarily all of the points 
in Q. If the set of points where Y is undefined is contained in some null event 
A and if Y(w) = X(w) for w € A‘, then we write X = Y a.s. and say that Y 
is an a.s.-defined random variable. It is sometimes convenient to include such 
functions Y in the equivalence class of X, and to define E(Y) to equal E(X) 
whenever the latter exists. Generally speaking, the theory of random variables 
and their expectations is easily adapted to accommodate a.s.-defined random 
variables. 


Problem 17. Let X and Y be a.s.-defined R-valued random variables on the same 
probability space. Prove that if E(X) + E(Y) is meaningful, then X +Y is an 
a.s.-defined random variable. Use this fact to improve Theorem 9, especially parts 
(i) and (viii). 


The following example illustrates an important way in which linearity is used 
to compute expectations. 


Example 2. Consider a deck of n cards, labeled from 1 to n. Suppose they 
are shuffled so that each of the n! possible arrangements is equally likely. Let 
X be the number of cards which occupy positions equal to their labels. We will 
calculate the expected value of X. Let J,, equal the indicator function of the 


4.3. MONOTONE CONVERGENCE 49 


event that the card with label m is in position m. Then X = >" _, Im, and 


B l _ _(n-1)! -1 
Ela) = Poor ig) =] 1})= aes. 
Hence, by property (i) in Theorem 9, 
n n n 1 
E(X)=E lal = Elp) = aS 


The technique of using indicator random variables to switch from probabilities 
to expectations so that linearity of expectation can then be used is a commonly 
used method for solving various problems. 


Problem 18. Consider a deck of n cards labeled 1 to n, arranged so that, for each 
m, the card labeled m is in position n — m + 1. Suppose that the deck is ‘cut at 
random’. That is, one of the cards is chosen on an equiprobable basis, dividing the 
deck into two packets, one packet containing the chosen card and all cards below 
the chosen card, and the other (possibly empty) packet containing the remaining 
cards; the order of the cards within each packet is left unchanged, but the order 
of the packets within the deck is reversed. What is the expected number of cards 
which end up in positions equal to their labels? 


4.3. Monotone convergence 


The following important theorem is the first of several results concerning circum- 
stances under which it is appropriate to interchange the taking of limits with the 
taking of expectations. Other such results will be found in Chapter 8. 


Theorem 11. [Monotone Convergence] Let (X1, Xo,...) be an increasing se- 
quence of R` -valued random variables on a common probability space (N, F, P). 
For each w € N set 

X(w)= lim Xn alw). 
n— oo 


Then X is a random variable and, asn 7 œ, E(Xn) Z E(X) (finite or infinite). 


PROOF. That X is a random variable follows from Proposition 11 of Chap- 
ter 2. That 
E(X1) < E(X2) <--- < E(X) 


follows from Lemma 7. Thus, by Definition 5, we can finish the proof by showing 
sup, E(Xn) > E(X). In view of the definition of E(X) we only need prove 


sup E(Xn) > E(Z) -€ 
for every £ > 0 and every simple random variable Z that satisfies Z < X. 
Fix such £ and Z, and, for each n, let 


An = {w: Xn(w) < Z -£ 


50 4. EXPECTATIONS: THEORY 


Using the linearity and positivity of E, we obtain 


E(Z) = E(ZI4,) + E((Z — §)Iac) + E(§ Laz) 
< max{Z (w): w E€ O}P(A,) + E(Xn) +5. 
We only need show this last expression to be no larger than E(X,,) + € for some 
n. We will perform this task by showing that P(An) > 0 as n > œœ. 
For each w, Xnlw) Z X (w) > Z(w) as n Z œ. Hence, A D Ag D... 
and NL; An = @. By the Continuity of Measure Theorem, P(An) —> 0 as 
now. O 


Corollary 12. If (X1, X2,...) is a sequence of R` -valued random variables, 
then 
E (È xa) = i EX) 
n=1 n=l 


Problem 19. Prove the preceding corollary. 


Problem 20. Let (4;: j = 1,2,...) be a sequence of events in a probability space 
(Q, F, P), and let (aj: j =1,2,...) be a sequence of real numbers. Assume that 


> la; |P(4;) < 00. 


Ji 


Prove that 


X (w) = N asta; (w) 
j=l 
is an almost surely defined random variable, and that 


OO 


E(X) =J _a;P(4;). 


j=l 


The next corollary generalizes the Monotone Convergence Theorem in a useful 
way. 


Corollary 13. Let (X,,X2,...) be an increasing sequence of R-valued ran- 
dom variables on a common probability space (Q, F, P). For each w E Q set 


Ko) = Jim Xalo): 


If E(X1) > ~, then E(Xn) > E(X) as n > œ. 


* Problem 21. Prove the preceding corollary. 


* Problem 22. Calculate E(X) and E(X?) for a random variable X having a geo- 
metric distribution by first writing expressions for E(X An) and E((X An)*) and 
then applying the Monotone Convergence Theorem as n > oo. 


4.4. EXPECTATION OF COMPOSITIONS 51 


* Problem 23. Calculate E(X) and E(X?) for a random variable X having a Poisson 
distribution by first finding expressions for E(X An) and E((X A n)*) and then 
applying the Monotone Convergence Theorem as n — oo. 


4.4. Expectation of compositions 


Given a random variable X which takes values in some measurable space (WV, G), 
it is often the case that we wish to find the expected value of a random variable 
of the form yo X, where y is a measurable function from WV to R. In making such 
a calculation, it is often convenient to be able to work directly with X and its 
distribution, rather than to try to apply the definitions to the random variable 
yo X. The results of this section are designed for this purpose. 

We begin with a result that, despite its simplicity, is very useful when working 
with random variables X that. take on only countably many different values. In 
particular, it applies when X is Z-valued. For simplicity, the result is stated for 
R’ -valued functions y. It is easily extended to R-valued functions ọ by applying 
it to yt and y7 


Proposition 14. Let X be a random variable from a probability space 
(Q,F,P) into a measurable space (U,G). Assume that for each x € WV, {zx} 
is a measurable set. Let Q be the distribution of X, and suppose that there exists 
a countable set A C Y such that Q(A) = 1. Then for any measurable function 
g: vo R’, 

Elpo X) = X pa)\RUr}). 
TEA 
(The summation on the right side may be taken in any order.) 


PROOF. Let (z1, z2,...) be an ordering of the members of A. (We will assume 
for simplicity that A is infinite. The modifications required for the finite case 
are obvious.) For n = 1,2,..., let 


m e — We a aes 


0 otherwise. 


Then Yn o X is a simple function. By Lemma 2, 


E(¥n oX)= Lv Yl Im) Q Cora bys 


To complete the proof, let n “7 œœ, note that yno X Z yo X a.s., and apply 
the Monotone Convergence Theorem in connection with (iii) of Theorem 9. O 


Problem 24. Discuss the similarities and differences between the proof of the pre- 
ceding proposition and the method suggested in Problem 22 and Problem 23 for 
computing the expected values of certain Z-valued random variables. 


52 4. EXPECTATIONS: THEORY 


Note that the sum on the right side of the formula given in Proposition 14 
involves only the distribution Q of X and the function y. It does not require 
knowledge of the underlying probability space (Q, F, P). The following theorem 
shows that in general, the quantity E(y o X) depends only on Q and the func- 
tion y, and the corollary following the theorem shows that for R-valued random 
variables X, Q determines E(X). (Incidently, in view of property (iii) in The- 
orem 9, this corollary allows us to drop the adjective ‘simple’ from Example 1 
and Problem 7.) 


Theorem 15. Let X be a random variable from a probability space (NQ, F, P) 
into a measurable space (V,G), and let Q denote the distribution of X. Denote 
the expectation operators with respect to P and Q by Ep and Eg, respectively. 
Then, for any measurable function y on (R, B), 


Ep(yo X) = Eg(y) 
in the sense that if either side exists, then so does the other and they are equal. 


PROOF. Suppose, first, that y is simple. Then yo X is simple. Thus, there 
exists a finite partition {Cj;: 1 < j < n} of Q and distinct real constants cj, 
1 < j < n such that 

n 
po X =} clc. 
j=l 
By Definition 1, 


(4.4) Ep(poX) =) eP(C)). 


For 1 <j <n, set Bj = {x: p(x) = cj}. By definition of Q, Q(UF_ B;)°) = 0. 
Hence, 


n 
v= ` cjlp, Q-a.s. 
j=l 


By Problem 4 and property (iii) in Theorem 9, 


which, as desired, equals the right side of (4.4). 

Now let y be an arbitrary R -valued random variable on (R,B,Q). By 
Lemma 13 of Chapter 2, it equals the limit of an increasing sequence (Yn: n = 
1,2,...) of nonnegative simple random variables. The sequence (yn ° X: n = 
1,2,...) is also an increasing sequence of nonnegative random variables—on the 
probability space (N, F, P) rather than on (R, B, Q)—and its limit is po X. Two 
applications of the Monotone Convergence Theorem give 


Fe(~) = lim EQ (Yn) 


4.5. THE RIEMANN-STIELTJES INTEGRAL AND EXPECTATIONS 53 


and 
Ep(po X) = lim Ep(ynoX). 
n— oo 


By the preceding paragraph the right sides of these two equalities are equal term 
by term, so the left sides are equal also. 
Clearly, pt o X = (yo X)™ and y7 o X = (po X). Hence, 


Ep((po X)*) = Ep(pt o X) = Eg(y*) 


and 

Ep((po X)")=Ep(p oX)=EQ(y ). 
The left sides of these equalities both equal oo if and only if the same is true of the 
right sides. If they do not both equal oo, subtraction gives Ep(pyo X) = Eg(y), 
as desired. O 


The following corollary shows that the expected value of a random variable 
depends only on its distribution. 


Corollary 16. Let X be an R-valued random variable on a probability space 
(Q, F, P), and let Q be the distribution of X. Then 


Ep(X) = Eg(a# ~ x) 
in the sense that if either side exists, then so does the other and they are equal. 


PROOF. Apply the theorem with y equal to the identity function zt ~ z. O 


Problem 25. Discuss the difficulties one might face in trying to prove either Theo- 
rem 15 or Corollary 16 by approximating X (rather than y) with simple measurable 
functions. 


Problem 26. Let X be an R-valued random variable that is symmetric about a 
point b (see Problem 23 of Chapter 3). Show that if E(X) exists, then E(X) =b. 


4.5. The Riemann-Stieltjes integral and expectations 


In Problem 2 we saw that the expected value of a simple random variable X on 
a probability space (NQ, F, P) can be represented as a Riemann-Stieltjes integral 
with respect to its distribution function F, since the expression given in that 
problem for E(X) equals f°. xdF(a) (see Appendix D). Similarly, if X is 
an R-valued random variable whose distribution Q is supported by a countable 
set A, and if p: R > R is an appropriately nice function, then the formula in 
Proposition 14 for E(y o X) can also be written in terms of a Riemann-Stieltjes 
integral: 


45) Bye X) = Ð o@lF@)- Fe) = f ole) arte), 


TEA a 


54 4, EXPECTATIONS: THEORY 


since Q({x}) = F(x) — F(x—) for all z € R. 

We will show in Theorem 17 that the right and left sides of (4.5) are equal 
for general R-valued random variables X, as long as the Riemann-Stieltjes inte- 
grals of yt and y7 with respect to F exist and are not both infinite. Such an 
equality gives us a powerful calculational tool because it is often straightforward 
to calculate Riemann-Stieltjes integrals, especially when the integrator F has 
a continuous density f: F(x) = La f(u)du (see Definition 9 of Chapter 3). 
In this case, the Riemann-Stieltjes integral can then be replaced by a Riemann 


integral to obtain 
CO 


Bigexy= J ple) f(a) de. 


=: OO 
Once we have such formulas in place, the usual techniques of calculus are avail- 
able. See, for example, Proposition 19, where integration by parts is used. 

Let X be an R-valued random variable with distribution Q and distribution 
function F. In order to compute Ep(yo X), it is sufficient by Theorem 15 to 
compute Eg(y). We begin by considering a function y: R —> R? for which there 
are real numbers zp < 21 < --: < £m such that ọ is constant on each of the 
half-open intervals (%;-1,2;], 1 < j < m, and is equal to 0 on the intervals 
(—œ, Zo] and (rm,00). In this case, the Riemann-Stieltjes integral of y with 
respect to F exists and we have 


m 
= 5 ylz) laz 
j=1 
which by Definition 1 equals Eg(y). 

Now drop the assumption that y is constant on intervals, but keep the as- 
sumptions that y is nonnegative and that y(x) = 0 for z outside some bounded 
closed interval. Assume further that y is bounded and Riemann-Stieltjes inte- 
grable with respect to F. Fix a and b to the left and to the right, respectively, 
of that closed interval. Choose a refining sequence (II, : n = 1,2,...) of point 
partitions of the interval [a,b] such that the lower and upper Riemann-Stieltjes 
sums for f? y dF corresponding to II, converge to that integral as n > ov. 

For each x € (r, s], where r < s are two adjacent members of IIn, set 


6,(2) = inf{y(u): r<u<_s} 
and 
Wn(x) = sup{y(u):r<u<s}; 
and, for x < a and for z > b, set 6,(x) = Yn(z) = 0. 


4.5. THE RIEMANN-STIELTJES INTEGRAL AND EXPECTATIONS 55 


Note that for each n, ôn, and Wy are R*-valued functions that take only 
finitely many values and are constant on each of the intervals (r, s] corresponding 
to adjacent members of the point partition II,. Thus, each Ôn and Yn is left- 
continuous and has finite right limits everywhere, so 


Fan) = f On(n)aF(a) and Elm) =f vale) dF(o), 


It is easily checked that the quantities on the right sides of these two equations 
are respectively the lower and upper Riemann-Stieltjes sums corresponding to 
the point partition II, for the integral of y with respect to F. Thus, 


im Ba(Gn) = lim Eoln) = | g(e)dF(e). 


N+ 0O 28H 


Since for each n, 0 < n < Y < Wn, we have by part (iv) of Theorem 9 that 
Eg(@n) < Ealy) < Ee(vn), and the desired conclusion follows. 


We have done most of the work involved in proving the following theorem. 


Theorem 17. Let X be an R-valued random variable with distribution func- 
tion F, and let y be an R-valued function that is Riemann-Stieltjes integrable 
with respect to F on every bounded interval. If E(p o X) exists, then 


CO 


E(po X) =i p(x) dF (2). 


— Co 


Problem 27. Complete the proof of the preceding theorem. In particular, you will 
need to treat functions y that are not necessarily 0 outside some bounded interval 
and functions ọ that take both positive and negative values. 


Corollary 18. Let X be an R-valued random variable with distribution func- 
tion F. Then 


E(X) = a ORON, 


in the sense that if one side exists, then both sides exist and are equal. 


Problem 28. Except for the assertion that the existence of Voen x dF (x) (possibly 
oo or —oo) entails the existence of E(X), the corollary is an immediate consequence 
of Theorem 17 and Theorem 15. Finish the proof of the corollary. 


56 4. EXPECTATIONS: THEORY 


Example 3. Let X be a random variable with a gamma distribution function 
F (defined in Example 2 of Chapter 3). Fix € > 0. By Corollary 18 


E(X)= f x dF (x) 


-f zdP(a)+ | zdF@)+ [cdr 


— 00 


=0+ f zaraa | CE ae 


The next to last integral is positive but no larger than £F (e), which approaches 
0 as € N 0. Therefore, as e N 0 the last integral approaches E(X): 


7 ore) (az) e7’? E 1 ae = _ (y+ 1) PE 
a =| Cae ail ee Wale 


The reason for the introduction of £ into the discussion is that, for 0 < y < 1, X 
does not have a continuous bounded density, so the Riemann-Stieltjes integral 
cannot, in that case, be transformed directly into a Riemann integral. (In order 
to have emphasized the dependence on y throughout we could have placed the 
subscript y on F and on E.) 


* Problem 29. Calculate the expected value of a normally distributed random vari- 
able. 


* Problem 30. Calculate the mean of an arbitrary beta distribution. 


It is important to note that Theorem 17 does not apply directly when E(yoX ) 
does not exist. As the following counterexample shows, it is possible for the 
Riemann-Stieltjes integral to exist even when the expected value does not. 


(Counter)example 4. Let us try to calculate E(X* sin X), where X is a 
Cauchy random variable (defined in Problem 8 of Chapter 3). Theorem 17 might 
lead us to believe that the answer is 


f © gsing 

ara. dx . 
The alternating series test can be used to show that this improper integral con- 
verges to a finite value. However, comparison with 1/z shows that 


oo : + ore) . z 
f z(sin x) de = f x(sin x) e, 
o mT(1+zr?) o m1+2?) 


Hence, by Theorem 17, 
E((X* sin X)t) = E((X+t sin X)~) = œ, 


4.5. THE RIEMANN-STIELTJES INTEGRAL AND EXPECTATIONS 57 


so E(X* sin X) does not exist. Thus, it is possible for the Riemann-Stieltjes 
integral os v(x) dF (x) in Theorem 17 to exist even though Eg(y) does not 
exist. 

Besides pointing out the dangers of applying Theorem 17 without verifying its 
hypothesis, this example also illustrates another important point. By considering 
yt oX and y7 o X separately, Theorem 17 can be used to show that E(yo X) 
does not exist, even though the formula in the theorem does not apply directly 
to poX. 


Of course, there are many situations in which E(yo X) exists, even as a finite 
number, and T v(x) dF (x) does not exist, because y may not be Riemann- 
Stieltjes integrable with respect to the distribution function F. It is not wise 
to use Fa y(x) dF (x) as a synonym for E(p o X) in such cases because the 
symbolism f p(x) dF (zx) suggests that the usual integration techniques of calcu- 
lus apply. In particular, the technique of integration by parts requires that the 
relevant integrals exist in the Riemann-Stieltjes sense. 

We now use integration by parts to obtain a useful formula for calculations of 
expected values connected with nonnegative random variables. 


Proposition 19. Let F be the distribution function of a Rt -valued random 
variable X, and let p: R > R be monotonic and left-continuous. Then the 
expectation of po X exists as a member of R, and 


(4.6) Bleo X) = (0) + f “ [1 F(a] dp(a). 


PROOF. We assume that y is decreasing; the result for y increasing then 
follows by considering —y. For y decreasing, it is easy to see that each of the 
left and right sides of (4.6) exists as a member of [—o0, 00). 

Let £ > 0. Since X > 0, F(x) = 0 for x < —e. It follows from Theorem 17 
and integration by parts that 


E(peX)= | ole) dF) 
= jm (f veara+ f í yla) dF(a)) 


(4.7) 


Jim (- / i p(z) d[1 — F (2) 


=. 


lim (-eann — F(M)] + p(-e) + J 


M- œ zE 


M 


LFG] dete) ) l 


If E(po X) = —oo, then y(M) > -œ as M > œ, so —y(M)[1 — F(M)] > 0 
for all sufficiently large M. It follows from (4.7) that Pp — F(x)] dp(x) = —oo. 
Thus, (4.6) is correct when E(y o X) = —oo. 


58 4, EXPECTATIONS: THEORY 


Now suppose that E(yo X) € R. We plan to show that y(M)[1— F(M)] > 0 
as M — oo in order to conclude from (4.7) that 


CO 


Beery =eeax J [1 — F(2)] de(2), 


for then the desired conclusion follows by letting € N 0. 
It is clear that y(M)[1 — F(M)] > 0 as M > œ if lim, 4 plz) > —co. 
Thus, we suppose that y(x) N —oo as x “ œ. Then, for sufficiently large M, 


0 > y(M)[1 — F(M)] > E((y- Io)) 9 X), 


which, by Corollary 13 of the Monotone Convergence Theorem, approaches 0 as 
Mow. O 


We highlight the most important special case of the preceding proposition as 
a corollary. 


Corollary 20. For an Rt -valued random variable X, 
E(X)=  1- F@laz, 
0 


where F is the distribution function of X. If X is Z*-valued, then 


E(X) = Pw: X(w) > k}). 
k=1 


* Problem 31. Let k > 0. For an exponential random variable X with density 
a ~~ ke~**, x € R™, use the preceding corollary to calculate E(X) and use Propo- 
sition 19 to calculate E(expoX). 


Problem 32. For a geometrically distributed random variable X, use Corollary 20 
to calculate E(X), previously calculated when solving Problem 22. 


Problem 33. Use Proposition 19 to Calculate E(X") for r > 0 and X having a 
standard uniform distribution in two different ways: by using Theorem 17 and by 
using Proposition 19. 


Problem 34. For X a random variable having the Cantor distribution, calculate 
E(X) and E(X?). 


* Problem 35. Calculate the expected length of a random chord of a circle under 
various reasonable interpretations of ‘random’ (see Example 3 of Chapter 2 and 
Example 4 of Chapter 3). Compare your different answers and discuss the relations 
among them. 


CHAPTER 5 
Expectations: Applications 


Expectations are amazingly useful in the study of random variables and their dis- 
tributions. Some of the reasons for this statement are contained in this chapter. 
In the first section, we introduce the ‘variance’ of an R-valued random variable. 
Variance is used to obtain one version of the Law of Large Numbers, also known 
informally as the Law of Averages. The ‘covariance’ of two random variables is 
also presented. In Section 2, variance and covariance are defined for R?-valued 
random variables, and Section 3 concerns the expectations of various functions 
of R-valued random variables. The chapter concludes with a discussion of ‘prob- 
ability generating functions’, used in the study of distributions on Z*. Several 
useful inequalities, including those of Chebyshev, Cauchy-Schwarz, and Jensen, 
are scattered throughout. 


5.1. Variance and the Law of Large Numbers 


Let X be a random variable with finite expected value u, and consider the 
function y(x) = (x — u)”. The quantity E(y o X), denoted by Var(X), is called 
the variance of X. It is one measure of the degree to which X differs from its 
mean. A random variable has zero variance if and only if it is almost surely 
equal to a finite constant. 


Proposition 1. Let X be a random variable with finite mean u. Then 
Var(X) = E(X?) — x’, 
whether finite or infinite. 
PROOF. 


E((X — p)*) = E(X? — 24X + p?) = E(X?) -— QuE(X) + p? 
= E(X?) — Qu? +p’ = E(X’) - w. O 


60 5. EXPECTATIONS: APPLICATIONS 


Example 1. Let us calculate the variance of the Poisson distribution. For a 
Poisson random variable X having the parameter à, we obtain from Problem 23 
of Chapter 4 that E(X) = à and E(X?) = å? + à. In view of the preceding 
proposition, the variance equals A. 


Problem 1. Let X be a random variable with finite variance. Prove that for all 
real numbers a, E((X —a)”) > Var(X). 


Problem 2. Prove the following Monotone Convergence Theorem for variances: 
Let (X1, X2,...) be an increasing sequence of R'-valued random variables on a 
common probability space (Q, F, P). Set X = lim, Xn. If E(X) < oo, then 
Var(X) = lim, Var(X7). 


The nonnegative square root of the variance is called the standard deviation. 
It has the same physical units as the mean. If the variance is infinite, then we also 
say that the standard deviation is infinite. If the mean is infinite or undefined, 


we say that the standard deviation and variance are undefined. 


Problem 3. Other than probability generating functions, which will be defined 
in the last section of this chapter, verify the entries in Table 5.1 that have not 


previously been calculated. 
Distribution probability prob. gen. 
on Zt assigned to {k} mean | variance | function: s ~> 


Binomial = rka [ps + (1 — p)]” 


Poisson 


Geometric 


TABLE 5.1. Basic facts about some distributions on Zt 


5.1. VARIANCE AND THE LAW OF LARGE NUMBERS 61 


Problem 4. Verify the entries in Table 5.2 that have not previously been calculated. 


Distribution 
on R density: 7 ~ mean variance 


Exponential 


Gaussian 
Normal another name for Gaussian 


TABLE 5.2. Basic facts about some distributions with densities 


Problem 5. Let X be a random variable with finite mean p and standard deviation 
o. Let Y = aX + b, where a and b are real constants. Prove that the mean of Y 
is au + b and that the standard deviation of Y is |alo. Notice, in particular, that 
if a = 1/o and b = —a E(X), then Y is of the same type as X, the mean of Y is 0, 
and the variance and standard deviation of Y are both equal to 1. 


Problem 6. Let X, be the random function defined in Example 2 of Chapter 2, 
k=1,2,.... Show that 


Var(X (j/k))=j/k, for j7=1,...,k. 
Comment on the use of the term ‘scaling factor’ in Example 2 of Chapter 2. 


Problem 7. Calculate the variances of the various distributions arising in connec- 
tion with Example 4 of Chapter 3, the random chord example. This problem is a 
continuation of Problem 35 of Chapter 4. 


62 5. EXPECTATIONS: APPLICATIONS 


We turn to a very simple inequality that relates the variance of a random 
variable to the probability that the random variable deviates a given amount 
from its mean. The inequality is quite crude in the sense that for many distri- 
butions that arise in practice, the bound given can be considerably improved by 
other methods. Nevertheless, because of its great generality, it is an important 
theoretical tool that will appear in many proofs. 


Proposition 2. [Chebyshev Inequality] If X is an R-valued random variable 
with finite mean u and standard deviation o, then 


P({w: |X(w) — pl > z}) < (@/2)? 
for each z > 0. 
PROOF. Let A= {w: |X(w) — u| > z}. Then 
o? = E((X — p)*) > E(1a(X — p)*) > E(I42°) = 2? P(A). 


Now divide by z? to obtain the desired result. O 


Problem 8. Give a sense in which the Chebyshev Inequality is best possible. Prove 
your assertions. 


The next result is an easy generalization of the Chebyshev Inequality. 


Proposition 3. [Markov Inequality] Let Y be an R-valued random variable, 
and let f be an Rt -valued function which is increasing on some interval J C R 
containing the support of Y. Then, for all z € J such that f(z) > 0, 


P({w: Y(w) > 2}) < E(f ° Y)/f(2). 


Problem 9. Prove the preceding proposition. For what choice of Y and f is the 
Markov Inequality equivalent to the Chebyshev Inequality? 


Problem 10. For z > 0, use the Markov Inequality to obtain one upper bound for 
P(|X| > z) in terms of E(|X|) and another in terms of F(X”), and also an upper 
bound for P(X > z) in terms of E(X7*). 


Proposition 4. [Cauchy-Schwarz Inequality] Let X and Y be two R-valued 
random variables defined on a probability space (0, F, P). Assume that E(X?) 
and E(Y?) are both finite. Then E(XY) exists and 


[E(XY)? < E(X*)E(Y"*). 


Moreover, equality holds if and only if one of the random variables X or Y almost 
surely equals a constant multiple of the other. 


5.1. VARIANCE AND THE LAW OF LARGE NUMBERS 63 


ProoF. If either X or Y is almost surely a multiple of the other, the result is 
an immediate consequence of linearity. Suppose that neither X nor Y is almost 
surely a multiple of the other. For each real a, 


E((aX — Y)?) — a? E(X?) ~ E(Y’) 


is meaningful and greater than —oo. By linearity, this quantity equals -2E (XY) 
when a = 1 and 2E(XY) when a = —1. We conclude that E( XY) exists and is 
finite. Applying linearity again, we have 


(5.1) E((aX — Y)?) = a? E(X?) —- 2aE(XY)+ E(Y”), 


which is a quadratic function of a. The left side of (5.1) is positive for all a, so 
the discriminant of the quadratic, 4[E(XY)|? -4E(X?)E(Y7?), must be negative. 
The desired inequality follows. O 


One consequence of the Cauchy-Schwarz Inequality is the following, which 
provides a useful complement to the Chebyshev Inequality because it gives an 
inequality in the opposite direction: 


Corollary 5. Let X be an R-valued random variable such that 0 < E(X) and 
0 < E(X?) < œ, and let A € [0,1]. Then 
E(X))? 
P({w: X NE(X sa 

(w: X(w) > AB(X)}) > (1-A ey 
PROOF. Let A be the event {w: X(w) > AE(X)}. By the linearity and 
positivity properties of the expectation and by the Cauchy-Schwarz Inequality, 

E(X) = E(XI4) + E(XIa-) < [E(X P(A) + AE(X), 


from which the desired inequality follows. O 


Problem 11. Prove the following: Let X be an R-valued random variable with 

finite mean p and standard deviation o. Then for 0 < z <a, 

2? \2 ot 

PeX =u 22} 2 la.) == 
D2 0-3) BOX 


For random variables X and Y defined on a common probability space and 
having finite means, we define their covariance by 


Cov(X, Y) = E[(X — E(X))(Y — E(Y))]. 


Note that Cov(X, X) = Var(X). Of course, the covariance need not be finite, or 
even exist. However, it is finite if both random variables have finite variance. 


Corollary 6. If X and Y are random variables satisfying the conditions of 
the Cauchy-Schwarz Inequality, Cov(X,Y) is a well-defined finite number, and 


[Cov(X,Y)]? < Var(X) Var(Y). 


64 5. EXPECTATIONS: APPLICATIONS 


PROOF. Replace X by X — E(X) and Y by Y — E(Y) in the Cauchy-Schwarz 
Inequality. O 


Proposition 7. Let X,Y, and Z be R-valued random variables defined on a 
probability space (Q, F, P). Assume that X,Y, and Z all have finite variances. 
Then 

(i) Cov(X,Y) = E(XY) - E(X)E(Y); 

(it) Cov(X, X) > 0; 

(iti) Cov( X,Y) = Cov(Y, X); 

(iv) Cov(aX + bY, Z) = aCov(X, Z) + bCov(Y, Z) for all real a and b. 


Problem 12. Prove the preceding proposition. 


* Problem 13. Find a formula for Cov(aX + b,cY + d) in terms of a,b,c,d, and 
Cov(X,Y). 


Properties (iii) and (iv) of Proposition 7 have the following consequence. 


Corollary 8. Let (X,,...,Xm) and (Y1, ..., Yn) be sequences of random vari- 
ables defined on a common probability space and having finite variances. Then 


cor Soa, En) -EE owon. 
k=1 


j=1 j=1 k=1 


In particular, 


The correlation between two random variables X and Y is defined by 


Cov(X, Y) 


Conte) = Fax) Var) 


whenever the covariance of X and Y is defined. (In this definition, 0/0 = 0.) 
Random variables X and Y are said to be positively correlated, negatively corre- 
lated, or uncorrelated if their correlation is defined and is positive, negative, or 
zero, respectively. 


* Problem 14. Let X be an R-valued random variable and let y: R > R be an 
increasing function. Assume that X and yo X both have finite variance. Show 
that Cov(X, po X) > 0. 


Problem 15. Prove that whenever the correlation between two random variables 
is defined, it lies in the interval [—1,1]. Further prove that if the two random 
variables each have nonzero variance, then the correlation equals +1 if and only if 


5.1. VARIANCE AND THE LAW OF LARGE NUMBERS 65 


one of the random variables is almost surely a linear function of the other. Also 
show that correlation is ‘dimensionless’ and translation invariant: 


Corr(aX + b, Y) = Corr(X, Y) 


for all real numbers a and b such that a > 0. 


Problem 16. Let X1,...,Xn be pairwise uncorrelated R-valued random variables. 


Prove that 
Var (© x.) = X Var(Xi). 
t=1 t=1 


* Problem 17. Calculate the variance of the number of cards whose positions equal 
their labels for each of the settings in Example 2 and Problem 18 both of Chapter 4. 


We now have the tools necessary to prove a simple version of a Law of Large 
Numbers, also known as a ‘Law of Averages’. The setting for this law is one 
in which an experiment is repeated a large number of times. The outcome of 
each trial (or of a measurement taken after each trial) is a real number. In other 
words, we consider a sequence (X1, X2,...) of random numbers. These random 
variables are all assumed to have the same distribution, which is a mathematical 
way of saying that the same experiment is performed at each trial. After per- 
forming the experiment n times, we take the average of the outcomes, obtaining 
the random variable 5°} X;,/n. The Law of Large Numbers states that for large 
n, this average is likely to be close to the mean of the common distribution of the 
random variables X;, provided that the random variables are either negatively 
correlated or uncorrelated. A stronger version of this result will be proved in 
Chapter 12. 


Proposition 9. [Law of Large Numbers] Let (X1, X2,...) be a sequence of 
identically distributed R-valued random variables defined on a common probabil- 
ity space (2,F,P). Suppose that the common distribution of the random vari- 
ables has finite mean u and finite variance o*. Further assume that each pair of 
random variables in the sequence is either uncorrelated or negatively correlated. 
Then, for each £ > 0, 


litn P({w: PaO) _,| > e}) =) 


n—> o0 n 


PROOF. Let Sn = >>} Xx. Then E(Sn/n) = u and Var(Sn) < no?. (Why?) 
Thus Var(Sn/n) < o?/n, and so the Chebyshev Inequality implies that 


P({w: |Sn(w)/n — u| > €}) < o° /ne?. 


Since the right side goes to 0 as n > oo, the proof is complete. 0 


66 5. EXPECTATIONS: APPLICATIONS 


Problem 18. Find an example to show that the correlation condition in the Law 
of Large Numbers cannot be eliminated. 


Problem 19. The proof of Proposition 9 implies more than is stated in the proposi- 
tion. State a more general proposition to which the proof of Proposition 9 applies. 


Problem 20. Let (Ai, A2,...) be events in a probability space (Q, F, P). Assume 
that all of the events A; have the same probability p. For n = 1,2,..., let Mn (w) be 
the number of events in the finite sequence (A1,..., An) which contain the sample 


point w. Find conditions on the quantities P(A; N A;) for i Æ j that imply 
Mnr 
lim P({w: Matel —p 
n 


n — o0 


>e}) =0 


for all £ > 0. Hint: Use indicator random variables. 


5.2. Mean vectors and covariance matrices 


Let X = (X1, X2,...,Xq) be an R¢-valued random variable. If the means of the 
coordinate random variables X% exist, the vector (F(X), E(X2),...,E(Xa)) is 
called the mean vector, of X, and, if all the variances are finite, the matrix with 
elements Cov(X;,.X;), 1,7 = 1,2,...,d, is the covariance matriz of X. The 
mean matrix of a random matrix is the matrix of expectations of the individual 
entries. Thus, if m denotes the mean vector of X (regarded as a row vector), 
then the covariance matrix of X, sometimes written as Cov(X, X), equals the 
mean of the random matrix (X — m)’(X — m), where ‘T’ denotes transpose; 
that is, 
Cov(X, X) = E((X —m)"(X —m)). 

The terminology just described is also used for random infinite sequences 
X = (Xi, Xe,...). For a random function X, such as defined in Example 2 of 
Chapter 2, it is often useful to consider the mean function t ~~ E[X;] and the 
covariance function: (s,t) ~ Cov(X., X+). 


Problem 21. Let (Q,7,P) denote the usual probability space for an infinite se- 
quence of coin-flips. Define X;(w) = 2w; — 1, i = 1,2,.... Show that the sequence 
(X1, X2,...) is pairwise uncorrelated. Use the formula in the Problem 16 to repeat 
Problem 6. Then find the covariance function of each of the random functions X; 
mentioned in Problem 6. What is the limit of the covariance function as k + co? 


Proposition 10. Let X be an Rt -valued random variable with mean equal to 
the zero vector Og and finite covariance matriz X. Let A be some nonrandom 
k x d matriz for some k. Then the random vector 


XAT = [AXT]" 


has mean 0; and covariance matrix AXAT. 


5.2. MEAN VECTORS AND COVARIANCE MATRICES 67 


PROOF. By linearity of expectation, coordinate by coordinate, 
E(XA‘) = E(X)A™ = 04A = 04. 
Again using linearity of expectation we obtain the covariance matrix of X AT: 
E([XAT]T XAT) = E(AXTXA’) = AE(XTX)AT = ADA”, 


as desired. O 


Problem 22. Use the preceding proposition to prove that if X has a finite covari- 
ance matrix X, then the variance of the random inner product (z, X} equals 2827, 
for any constant z € R?. 


Problem 23. Observe that covariance matrices are necessarily symmetric, and then 
use the preceding problem to prove that every eigenvalue of a covariance matrix is 
nonnegative. 


A square matrix is said to be positive definite if every eigenvalue is nonnega- 
tive. It is strictly positive definite if every eigenvalue is positive. According to the 
preceding problem, every covariance matrix is symmetric and positive definite. 
This fact and its converse constitute the following theorem. 


Theorem 11. A d x d matriz is the covariance matriz of some R! -valued 
random variable if and only if it is symmetric and positive definite. 


PROOF. In view of the Problem 23 we only need prove the ‘if’? part. Let 
Y be symmetric and positive definite. Denote the eigenvalues of Y (counting 
multiplicity) by vu?, 1 <i < d, where each v; > 0. Since Y is symmetric there 
exists a matrix O such that O7! = OF and 


vê 0 0 

0 ve ... 0 
Vso" |) ~ (o. 

0o 0 ie 

Set 

vı 0 0 

0 G auei 0 
A=O7] _ . VO: 

0 0 ree Ud 


A straightforward calculation shows that Y = AAT. 

Let X;, 1 < i < d, be as in Problem 21 (ignoring Xqg41, Xgi2,... as defined 
there). By that problem and a simple calculation of variances, the covariance 
matrix of X = (X,,...,Xq) is the identity matrix. Insert the identity matrix 
for Y in Proposition 10 to obtain that the covariance matrix of XAT is AA‘, 
which equals ©. O 


68 5. EXPECTATIONS: APPLICATIONS 


Problem 24. Decide which of the following matrices are covariance matrices. For 
those which are, find the matrix A described in the preceding proof. 


3 2 7 —2V/3 2. 4 
ay i —2/3 ae @ 7 


1 -2 4 0 -2 4 
—2 4 8], —2 3 -8 
4 —8 16 4 -8 15 


The notation Cov(X, X) for the covariance matrix of the random vector X 
can be extended to accommodate two random vectors X and Y, not necessarily 
having the same number of coordinates. The entry in position (i, j) of Cov(X,Y) 
is, by definition, Cov(.X;, Y;). Thus, 


Cov(X,Y) = E([X - E(X) IY - E(Y)}). 


Similarly, one can speak of the covariance function of two random functions. 


Problem 25. Let X1, X2, X3, and X4 be as in Problem 21 and set Yı = Xı + X2, 
Yo = Xo + X3, Ys = Xs + X4, Ya = X44+ Xa, Ys = X1 + X3, Ye = X2 + X4, and 
Yz = X, + Xo + X; + X4. Find the 4 x 7 covariance matrix of X and Y. 


5.3. Moments and the Jensen Inequality 


For an integer n # 0, E(X”) is called the n* moment of the R-valued random 
variable X. (0” is undefined for negative odd values of n, and is defined to be oo 
for negative even values of n in this definition.) If E(X”) fails to exist, then we 
say that the nt? moment of X does not exist. For a real number r # 0, E(|X|") 
is called the rt? absolute moment of X. (0° = œ for r < 0 in this definition.) 
If the mean p of X is finite, the nt? moment of X — p is called the n*" moment 
of X about its mean, and the r® moment of |X — yp] is called the r° absolute 
moment of X about its mean. 

The next result can be used to obtain inequalities between different moments 
of a random variable. For it we need a definition: an R-valued function y defined 
on an interval J C R is convex if 


(5.2) p((1 —A)z + Az) < (1— A) (2) + Avlz) 


whenever z and z belong to J and O <A < 1. Set y = (1 — à)x + Xz and observe 
that a geometrical interpretation of (5.2) is that the point (y, y(y)) lies below or 
on the line segment with endpoints (x, y(x)) and (z,y(z)). The condition (5.2) 
can be rewritten as 


(5.3) (1 —A)[y(y) — plx) < Ale(z) - yy]. 


Here z, y, and z are arbitrary members of J satisfying x < y < z, and A is 
chosen so that y = (1 — à)xz + Az. If we ignore the cases y = x and y = z for 


5.3. MOMENTS AND THE JENSEN INEQUALITY 69 


which (5.2) is trivially satisfied, we can divide (5.3) by (1 — A)(y— £) = A(z — y) 
to conclude that y is convex on J if and only if 


ply) — elr) — plz) - vy) 


(5.4) ao 


whenever z < y < z belong to J. The function ọ is said to be concave if its 
negative is convex. 


Problem 26. Show that if p is a R-valued function which is convex or concave on 
an interval J C R, then ¢ is measurable. Hint: First use (5.4) to show that y must 
be continuous on the interior of J. 


Proposition 12. [Jensen Inequality] Let X be a random variable with finite 
mean, and let J be an interval that supports the distribution of X. Let p be an 
R-valued function which is conver on J. Then 


Q(E(X)) < Elpo X). 


PROOF. The result is obvious if X is almost surely equal to a constant. So, 
suppose that this is not the case. Then by property (v) in Theorem 9 of Chap- 
ter 4, E(X) is not an endpoint of J. Let y = E(X), and set 


M = sup f AW u <y andu e J). 


The quantity M is finite since it is bounded above by the right side of (5.4) for 
some z > y, with z € J. It follows from the definition of M and (5.4) that 


M(v — E(X)) < p) - y(E(X)) 
for all v € J. Since J supports the distribution of X, we conclude that 
M(X — E(X)) < yo X — ọ(E(X)) as. 
The result of taking expected values is 
0 < E({p 0 X) -y(E(X)). O 


The reader may find it useful to think about the Jensen Inequality and its 
proof for the special case in which the support of X contains just two points, x1 
and z2. Let p= P({w: X(w) = z1}). Then 


E(X) = pzı + (1-p)zr2 and E(poX)=pp(ri)+ (1 -— p)y(ae), 


and the Jensen Inequality follows immediately from the definition of convexity. 
See Figure 5.1, where this argument is illustrated graphically. 


70 5. EXPECTATIONS: APPLICATIONS 


FIGURE 5.1. The Jensen Inequality for a random variable X 
with support {21,22} 


Problem 27. Prove that if p: J — R is a convex function and A1,...,An are 
nonnegative numbers that sum to 1, then for all z1,..., £n € J, 


plizi + Anan) < Ary(e1) +--+ +Any(en)- 


Use this fact to extend the argument given in the preceding paragraph to prove 
the Jensen Inequality for random variables X with finite support. 


Problem 28. Let X be an R-valued random variable. Use the Jensen Inequality to 
prove that the function f(p) = [E(|X|?)]'”” is an increasing R-valued function for 
0 < p< œ. Show that if X is not a constant a.s., then limp... f(p) = sup{|z|: 0 < 
F(x) < 1}, where F is the distribution function of X. Describe the situation in 
which X is a constant. 


5.4. Probability generating functions 


We conclude this chapter with an important application of expectations to the 
study of distributions of certain random variables. 


CONVENTION. In the probability generating function setting, 0° = 1° = 1, 
derivatives of functions defined on [0,1] are understood to be right derivatives 
when taken at 0 and left derivatives when taken at 1, and if a derivative at 1 of 
order m equals oo, then all derivatives at 1 of order n > m also equal oo. 


Let X be a random variable whose distribution is supported by the set 
Z = {0,1,2,...,co}. The probability generating function of X (and also of 


5.4. PROBABILITY GENERATING FUNCTIONS 71 


the distribution of X) is defined by 


0 if0<s<1 


s) = X) = 5 P(fw: X(w) = khs 
p(s) = E(s*) >, Și (w) }) ee ifs=1. 


k=0 


Theorem 13. Let p be the probability generating function of a Z` -valued 
random variable X, and let p‘*) denote the derivative of p of order k. Then: 


P({w: X(w) = oof) = 1 - ¢(1-) = p(1) - p(1-); 


(0) 
P({w: X(w) = k}) = = aS OI ee 
n—1 
e (Tem) =o, n=1,2,.... 
m=0 


PROOF. As s Ž 1 through any sequence, s* 7 Ta: X(w)<oo}- The first 
formula in the theorem then follows by the Monotone Convergence Theorem. 
The second formula is the standard formula for the coefficients of a power series. 
The third formula is obvious if p(1—) < 1, since in this case, X is infinite with 
positive probability. Thus we restrict our attention to the case p(1—) = 1. 

By definition 

ee Sc) ee bes 

POR eg ee 
As s increases to 1 through an arbitrary sequence, the nonnegative random 
variables (1 — s*)/(1 — s) form an increasing sequence whose limit is X. It 
follows from the Monotone Convergence Theorem that p'(1) = E(X) (whether 
finite or infinite), and, therefore, that the third formula holds for n = 1. To 
continue the proof by induction, assume the formula is true for n. Then, with 
pr = P({w: X(w) = k}), 


(n)(1) — pl) 


s/71 1l-s 
—] _ 
gg, E(X =m) - Din eta ves 
sA l1- s 
k! k! = 
= jim Dhan Teeny Pl Z Eien Teen Past” 
s71 l-s 


=x (6h kt 


=  (1—s)(k—n)i* 
(1 —s*-") = 
-iE ( as) TT -m) 


We have applied Proposition 14 of Chapter 4 twice in this series of equalities. 
Note that, because of the inductive hypothesis and the convention concerning 


72 5. EXPECTATIONS: APPLICATIONS 


infinite derivatives, these equalities are correct, even in the case that p™ (1) = oo. 
To complete the proof, the Monotone Convergence Theorem can now be applied 
as above, this time using the fact that 0 < (1 — s*)-")/(1—.s) A (X (w) = n) 
as s 71 for those w for which [[?~)(X(w) —m) #0. O 


The quantity EX —m)) is called the n™ factorial moment of the 
random variable X. The preceding proposition and our interest in (ordinary) 
moments motivates us to consider the relation between the two sequences of 
polynomials: 

n-1 

([[ @-m): WO 200): amd. (eer O01 eas) 

m=0 
where the empty product obtained for n = 0 in the left sequence is defined to 
equal the multiplicative identity 1. Consideration of the degrees of the various 
polynomials makes it clear that there exist two doubly indexed sequences of 
constants 


(s(n,k):0<k<n,nE€Zt) and (S(n,k):0<k<n,n€Z*) 


such that 
n—1 n n k~i 
[[ (@-m) = X s(n, k)a” and =Y St) ( J] e-m); 
m=0 k=0 k=0 m=0 


The numbers s(n,k) and S(n,k) are called Stirling numbers of the first and 
second kinds, respectively. In order to define ‘infinite square matrices’ of Stirling 
numbers one sets s(n,k) = S(n,k) = 0 for k > n. It is easy to show that the 
two matrices are inverses of each other. 


* Problem 29. Calculate the Stirling numbers s(n,k) and S(n,k) for n < 3 and all 
k. Make sure that your answers are consistent with the assertion that the matrix 
of Stirling numbers of the first kind is the inverse of the matrix of Stirling numbers 
of the second kind. Also, use your calculations to express: the second factorial 
moment of a random variable in terms of its first and second (ordinary) moments; 
the second (ordinary) moment in terms of the first and second factorial moments; 
the third factorial moment in terms of the first, second, and third moments; and 
the third moment in terms of the first, second, and third factorial moments. 


Problem 30. Verify the entries in the probability generating function column of 
5.1 presented earlier. Then use the probability generating functions, Theorem 13, 
and Problem 29 to confirm the means and variances given in that table. 


Problem 31. Prove that the function 
2 
p(s) = (2 e s)(3 = s) ’ 


is a probability generating function of a distribution. Calculate this distribution 
and its mean, variance, and standard deviation. Hint: Use partial fractions. 


0<s<l, 


5.5. CHARACTERIZATION OF PROBABILITY GENERATING FUNCTIONS 73 


* Problem 32. Replace the function in the preceding problem by 


8 
Mo aaa O<s<l. 


* Problem 33. Fix p € [0,1]. Prove that the function 


1—,/1—4p(1—p)s? if0<s<1 
POS a 


ifs=1 


is a probability generating function of a distribution. Calculate the distribution and 
its mean and variance—writing the probabilities in the distribution in a form that 
1 


Biante š z 2q 
makes it immediately apparent that they are nonnegative. (The number EE ( a) 


th 
involved in the solution is called the q Catalan number.) 


5.5. Characterization of probability generating functions 


The following result characterizes those functions which are probability gener- 
ating functions of distributions supported by Z'. The ‘only if’ aspect of the 
theorem is obvious and also the most used, often without comment. 


Theorem 14. An R-valued function p defined on [0,1] is a probability gener- 
ating function of a Z -valued random variable if and only if p(1) = 1, p(1—-) < 1, 
p(0) > 0, and all derivatives of p are finite and nonnegative on [0,1), in which 
case p equals its Maclaurin series on [0, 1). 


Proor. We use p\*) to denote the derivative of order k of p; it equals p itself 
if k = 0. In view of the paragraph preceding the theorem, we can complete the 
proof by assuming that p has the properties given in the theorem and proving 
that 


© p(k) 
(5.5) o(s) =) PO 
k=0 


for 0 < s < 1, because one can then define a corresponding distribution by 
Q({k}) = p™ (0)/k! for k € Z* and Q({oo}) = 1 — p(1-). 

We begin by obtaining some consequences of the Taylor Formula with remain- 
der. ForO<r<s<l, 


Ved Ak) (n) 
ps) = E EC -t 4 Enns) ann 
Da i 


for some Zn,r,s Satisfying r < Zn,r,s < $. It follows that 


ore n—1 n 
(5.6) ` pe) (s—r)* < p(s) < peer) (s—r)F + eM —r)” 
k=0 i k=0 ' áz 


74 5. EXPECTATIONS: APPLICATIONS 


because the derivatives of p are nonnegative. Suppose that 2s—r < 1 (in addition 
to0<r<s< 1). Setting r = s and s = 2s — r in the left inequality in (5.6), 
we conclude that 


© pík) 
SE sm) < pl2s—r) < o0, 


so that ane —r)" = 0 as n — oo. We use this fact to conclude from (5.6) 
(as it stands) that 


œ (k) 
4 au (s—r)® for 0<r<s<2s—-r<l. 


(5.7) 


The special case r = 0 gives (5.5) for s < $. 

For a proof by contradiction of the theorem, suppose that (5.5) fails for some 
SE 5, 1). Let t denote the greatest lower bound of the set of such s. Choose 
r <tand s > tso that 2s—r < 1. Because p equals its MacLaurin series in 
the interval [0,¢), the derivatives of p in the interval [0, t) can be calculated by 
term-by-term differentiation. Thus, 


EP 


m= 
Insertion of this expression into (5.7) and an interchange of order of summation 
in the resulting double sum gives 


= m j: rr ew) po V) 
=>>vGle-9 -5O 
m=0 k=0 
as desired. Since the summands are nonnegative, the interchange of order of 


summation is justified according to a result from advanced calculus, which is 
repeated in this book as Corollary 10 of Chapter 6. O 


Problem 34. Let 


Use the preceding theorem to prove that there does not exist € > 0 such that 
f(s) > 0 for all k € Zt and all s € [0, £). 


CHAPTER 6 


Calculating Probabilities 
and Measures 


In this chapter, after looking at several ways in which an event A can be defined 
in terms of other events A,;,Ao,..., we will develop methods for calculating 
P(A) in terms of the quantities P(.A,), P(A2),.... These methods include the 
Kochen-Stone and Borel-Cantelli Lemmas, the inclusion-exclusion formula, and 
some convergence theorems. Also, in the last section of this chapter, we will 
discuss measures other than probability measures. 


6.1. Operations on events 


In many cases, it is useful to describe operations on sets in terms of operations 
on the corresponding indicator functions. Here are indicator-function versions 
of some of the set-theoretic operations that we have already been using: 


Iac: = 1 — I4; 
Iag = Ia(1 — Ip); 
Iang = I4 Alp = Il; 
laus = la V Ig = Ia + Ip — Ialp; 


In, An = inf I4, = [ia 


Iu,A, = supla, . 
n 


The set (A \ B)U(B\ A), denoted by A A B is called the symmetric difference 
of A and B. If A and B are events in some probability space, then so is A A B. 
The symmetric difference of two sets consists of the points where the indicator 
functions of the two sets are different. In terms of indicator functions, we have 


IaaBn = Ia + Ig —2lalp. 


76 6. CALCULATING PROBABILITIES AND MEASURES 


Problem 1. Let A and B be two events in a probability space (Q, F, P). Show 
that P(A A B) = 0 if and only if I4 = Ig a.s., and that each of these conditions is 
equivalent to the condition that A and B be equal a.s. 


If I = limn Ia, exists, then I is a function which takes only the values 0 
and 1, so it is itself an indicator function of some set A contained in 2. In this 
case, we say that the limit of the sequence (Aj, Ao,...) exists and equals A, and 


we write 
A= lim A,. 


n> oO 


Problem 2. A sequence of sets is said to be increasing or decreasing, respectively, 
if the corresponding sequence of indicator functions is increasing or decreasing. 
Prove that 


(o 9] 
lim A, = U An if the sequence (Ai, A2,...) is increasing 
n> OO 1 

n= 


and 


lim A, = N An if the sequence (Aj, A2,...) is decreasing . 
n= OO ae 


In general, the limit of a sequence of sets may not exist. Let J = liminf I4, 
and K = limsupJ,4,. Then J and K are indicator functions of sets B and C. We 
call B the limit infimum and C the limit supremum of the sequence (A1, Ao,...) 


and write 
B = liminf A, and C=limsupA,. 
n> o 


NO 
From the definitions it follows that the limit of a sequence of sets exists if and 
only if the limit infimum and limit supremum are equal, in which case the limit 
equals their common value. 


Proposition 1. The limit infimum and limit supremum of a sequence of 
events is an event. If the limit of the sequence of events exists, it is an event. 


Problem 3. Prove the preceding proposition. 


Problem 4. Let (Ai, A2,...) be a sequence of subsets of a space Q. Prove that 
liminf A, C lim sup An, and that 


lim sup A, = a Ù Am and liminf A, = Ù N Ån. 


Gre n=l] m=n S n=l m=n 

Problem 5. Show that the limit supremum of a sequence of sets is the set of points 
that are members of infinitely many of the sets in the sequence. Give a similar 
description of the limit infimum of the sequence. Also give a description in terms 
of set membership of what it means for the limit of a sequence of sets to exist. 


6.2. THE BOREL-CANTELLI AND KOCHEN-STONE LEMMAS 77 


* Problem 6. Show that (lim inf;4. An)? = limsup,_,,, Ah. 


Problem 7. Suppose that A, C Bn C Cn for all sufficiently large n € Z* and that 
lim infr» An D limsup,_,,,Cn. Prove that each of limn+oo An, limnoo Bn, 
and limn—+oo Cn exists and that all three limits are the same. 


* Problem 8. Let (Ai, A2,...) and (Bi, Bo,...) be sequences of subsets of a set 
Q. Prove that each of the following equalities holds whenever the right side is 
meaningful: 


lim (A, U Bn) = ( lim A,)U(lim B,); 
n= co n= o0 n= oo 
lim (An N Bn) = ( lim An) N (lim Ba); 
n= co n= n— oo 
lim (An \ Bn) = ( lim An) \ (lim Bn); 
n — CO n= oo n —> CO 
lim (A, A Bn) = ( lim A,) A ( lim Bn); 
n= Co n= oo n= OO 
lim Aj, = ( lim An). 
n -> 00 noo 
Also state and prove similar results involving limits suprema and limits infima, 
giving examples to show that equality is not valid in cases where your results are 


subset relations. Use indicator functions for some of the proofs and set-theoretic 
arguments for others. 


The following exercise brings probability measures into the picture. 


* Problem 9. Let (A1, A2,...) be events in a probability space (Q, F, P). Show that 


P(lim sup An) > lim sup P(A,) > liminf P(A,) > P(liminf An). 
NR OO NI OO 


n= OO noo 


As an immediate consequence of the preceding exercise, we obtain an improved 
version of the Continuity of Measure Theorem, Theorem 2 of Chapter 3. 


Theorem 2. [Continuity of Measure] Let (A), A2,...) be a sequence of 
events in a probability space (Q,F,P). If A = limp+.An, then P(A) = 
mrss F lAn) 


6.2. The Borel-Cantelli and Kochen-Stone Lemmas 


This section is devoted chiefly to the computation of the probability of the limit 
supremum of a sequence of events. According to Problem 5, limsup,, An is the 
event consisting of those sample points that lie in infinitely many of the events 
An. For a simple example in which such an event is of interest, consider the 
coin-flip space, Example 4 of Chapter 1. Let A, be the event that the n** flip is 
heads. Then A = lim sup„ Án is the event that there are infinitely many heads 
in the entire sequence of coin flips. The following three lemmas are often quite 
useful in calculating or estimating the probability of the limit supremum of a 
sequence. 


78 6. CALCULATING PROBABILITIES AND MEASURES 


Lemma 3. [Borel] Let (Ai, A2,...) be a sequence of events in a common 
probability space (N, F, P) and set A = limsup,_,., An- If 07, P(An) < œ, 
then P(A) =0. 


PROOF. By Problem 4, 
ACU An 
n=m 


for each m. It follows from the monotonicity and subadditivity of probability 


measures that 
pia) <P (U an) < $ Pen): 


Since 5°, P(An) < œ, the right side approaches 0 as m —> œ; so P(A) = 0. O 


Problem 10. Let (X1, X2,...) be a sequence of random variables defined on a 
common probability space. Assume that each random variable X, is uniformly 
distributed on [0,1]. Prove that for all a > 1, 


Lemma 4. {Kochen-Stone] Let (A1, Ao,...) be a sequence of events in a prob- 
ability space (NQ, F, P) and set A = limsup,_,,, An- If Y7, P(An) = œ, then 


PAs || 
(6.1) P(A) > marsun e ee, ; 


PROOF. Let Im denote the indicator function of Am. For n > 1 let 


In = Y Mm. 


Thus, J,(w) is the number of events A, ..., An that contain the point w. Since 
n 

(6.2) E(Jn) = $ P(An); 
m=1 


we have by hypothesis that lim, E(Jn) = œ. By Problem 5, 
Asdo: lim Jalu) = ok 
It follows that for all A > 0, 


A D limsup B,,, 


n—> c 


where 


Bana = {w: Joy > AE(Jn)}. 


6.2. THE BOREL-CANTELLI AND KOCHEN-STONE LEMMAS 79 


By Problem 9, 


(6.3) P(A) > P(lim sup Bn) > lim sup P(B,,,) for all A > 0. 


noo TL CO 


By Corollary 5 of Chapter 5, 


(1 -AP Ehn) 


(6.4) P(Bn,x) > E(J2) 
for 0< à< 1. By (6.3) and (6.4) 
2 
P(A) > beep a ; 


The proof is now completed by applying (6.2) and noting that 


BRIS > >, Ealan) S S PANA) D 


k=1 m=1 k=l m=l 


The following important special case of the Kochen-Stone Lemma is a partial 
converse of the Borel Lemma. For it, we need the following terminology: two 
events are positively correlated, negatively correlated, or uncorrelated according 
to what their indicator functions are. 


Lemma 5. [Borel-Cantelli] Let (A), Ao,...) be a sequence of events in a prob- 
ability space (Q,F,P). Assume that for each i # j, the events A; and A; 
are either negatively correlated or uncorrelated. Let A = limsup,_,,,An. If 
Yn P(An) = œ, then P(A) = 1. 


n=l 


Problem 11. Prove the preceding lemma. Hint: The correlation condition is equiv- 
alent to 


P(A O Am) < P(Ag)P(Am) ifk #m. 


Thus it is enough to show that the summands with k = m in the denominator of 
the formula in the Kochen-Stone Lemma can be ignored. 


The preceding problem requests a proof that is rather straightforward only 
because it is based on a result whose proof is somewhat difficult. Under a 
somewhat more restrictive hypothesis there is a simple direct proof (requested 
in Problem 38 of Chapter 9). 


Problem 12. In the context of the coin-flip space of Example 4 of Chapter 1, let 
An be the event that the nt” flip is heads. Use the Borel-Cantelli Lemma to prove 
that P(limsup, _,,, An) = 1. 


80 6. CALCULATING PROBABILITIES AND MEASURES 


* Problem 13. Let (X1, X2,...) be a sequence of pairwise negatively correlated or 
uncorrelated random variables, each having a standard Bernoulli distribution. 


Show that 
yx, Sora 


< œ a.s. 


e(5x) io 


respectively. Construct an example to show that the correlation assumption cannot 


if and only if 


be dropped. 


6.3. Inclusion-exclusion 


We turn now to a formula that expresses the probability of a union of events 
in terms of sums and differences of probabilities of intersections. This formula 
generalizes the identity P(AUB) = P(A)+ P(B) — P(AN B). It is instructive to 
note that here, as in the proof of the Kochen-Stone Lemma, the use of indicator 
functions facilitates the argument. 


Theorem 6. [Inclusion-Exclusion] Let C),...,Cyn be events in a probability 
space (Q), F, P). Then 


P(n) 


n n m—l 
= X P(Cm)- X, $ P(N Cm) 
m=i m=2 l=1 
n m—-1l—-1 n 
+ P(CENCiN Cn) =+ (DHP. f Cr). 
m=3 l=2 k=1 q=1 
PROOF. Let I be the indicator function of Up, -1 Cm and, for m = 1,2,...,n, 


let Im be the indicator function of Cm. It suffices to prove that 


n n m-1 
f= i=) De 
(6.5) Em M i n 
+5 IkliIm -+ OD S, 


m=3 l=2 k=1 q=1 


for then the result follows by taking expectations. We will check that both sides 
of (6.5) have the same value for each w € Q. Consider an arbitrary w and let 
p(w) equal the number of events Cn that contain w as a member. 


6.4. FINITE AND o-FINITE MEASURES 81 


If p(w) = 0, then every term on both sides of (6.5) equals 0. Suppose that 
p(w) > 0. Then the left side of (6.5) equals 1. The value of the right side can be 
expressed in the form 


p(w) 


(6.6) SO (DH Sil), 


i=1 


where S;(w) equals the value of the it? term in (6.5), a term that itself involves i 
summation symbols. Each summand of the it” term is the product of indicator 
functions and, thus, equals 1 or 0. Therefore, S;(w) equals the number of sum- 
mands in the it? term that equal 1—that is, the number of ways of choosing i 
events from among all those events Cn that contain w. It follows that 


1) PU)! 
Oe) > T 
Insertion into (6.6) gives 1 minus the binomial series for (1 — 1)?'”), so the right 
side of (6.5) equals 1. O 


Problem 14. Suppose two dice are rolled. What is the probability that at least 
one five appears? Use two different methods for this problem. Follow the same 
instructions for three dice, then for n dice. 


Problem 15. In the context of Example 2 of Chapter 4, calculate the probability 
that at least one card occupies a position equal to its label. Discuss the limiting 
behavior as the number of cards approaches oo. 


Problem 16. In the context of Problem 18 of Chapter 4, calculate the probability 
that at least one card occupies a position equal to its label. Discuss the limiting 
behavior as the number of cards approaches oo. 


Problem 17. We know P(U Cn) < 5°, P(Cn). It develops that other inequalities 
of a similar nature are true. They can be obtained by truncating the right side of 
the inclusion-exclusion equality and then proved by a modification of the proof of 
that equality. Carry out this program. 


6.4. Finite and o-finite measures 


For the remainder of this chapter, we consider some of the consequences of drop- 
ping the assumption that P(Q) = 1. We are mainly concerned with definitions 
and examples here. Further developments will be found in Chapters 7 and 8. 


Definition 7. Let (Q, F) be a measurable space. A measure on (Q, F) is a 
countably additive R` -valued function u defined on F. For A € F, the quantity 
(A) is called the measure of A. If u(Q) < œ, then yp is a finite measure on 
(Q, F). Otherwise, u is an infinite measure. 


82 6. CALCULATING PROBABILITIES AND MEASURES 


If (Q,F,) is a finite measure space and if u(Q) > 0, then (Q,F,P) is a 
probability space, where P(A) = p(A)/pu(Q) for all A € F. Thus, the theory 
developed so far for probability spaces is easily generalized to finite measure 
spaces, and no further comment is required. 

Before focusing on infinite measure spaces, we treat some needed results from 
advanced calculus. 


Proposition 8. Let S = (8mn:m,n = 1,2,...) be a doubly indexed set of 
members of R. Suppose that 8m.» is increasing in both m and n, in the sense 
that if j <m and k <n, then Sjk < 8m. Then 


lim lim Sm n = lim lim s,,,=supS. 
moO N+ 00 nO m> 


PROOF. It is clear that supS is at least as large as either of the double limits. 
On the other hand, if s < sup S, then by definition of the supremum, there exist 
integers j, k such that s;, > s. It follows from the monotonicity assumption that 
Smn > 8$ for all m > j and n > k. Thus, both double limits are greater than 
s. Since s is an arbitrary number less than sup S, both double limits are greater 
than or equal to sup S. Equality of the double limits and sup S now follows. O 


Corollary 9. [Monotone Convergence for Sums] Let (am n: m,n = 1,2,...) 
be a doubly indexed set of members of [0,00]. Suppose that amn is increasing in 
n, in the sense that if k <n, then am, k < am,n for all m. Then 


CO CO 
` lim am.n = lim > Gan 
n—> n—- Co 
mal m=1 
PROOF. Let 
m 
Sm,n = ` QAm,n 
j=l 


and apply Proposition 8. O 


Corollary 10. [Fubini Theorem for Nonnegative Sums] Suppose that A = 
(am, n: M, n =1,2,...) is a doubly indexed set of members of [0,00]. Then 


co œœ Co œ 
S So amn = >> >> amn = supS, 


m=1n=1 n=l1m=1 


where S is the set of all sums of finitely many members of A. 


PROOF. Set 
m n 
Smn =) 2 Aik 

j=1 k=1 
and let S = sup{Sm,n: m,n = 1,2,...}. By Proposition 8, the two double sums 
in the statement of the corollary are both equal to S. Clearly, S < sup S. On 
the other hand, for any s € S, there is a partial sum 8, that is at least as large 
as s, so S > sup S. The desired equality follows. O 


6.4. FINITE AND o-FINITE MEASURES 83 


Problem 18. Let A = (an: n = 1,2,...) be a sequence of members of [0, 00], and 
let m be any permutation of the natural numbers (in other words, m is a bijection 
from the natural numbers to the natural numbers). Prove that 


where S is the set of all sums of finitely many of the terms in A. 
With the preceding tools in hand we return to the study of measures. 


Problem 19. Let yj, 7 E J, be countably many measures on a measurable space 
(2, F). Set 
(A) =X p;(A) 
jeJ 


for all A € F. Prove that p is a measure. 


Problem 18 ensures that the order of terms is irrelevant in the sum in Prob- 
lem 19, and Corollary 10 implies that a true assertion would be obtained even if 
that sum were replaced by a double sum of doubly-index measures. 

Problem 19 suggests: Write a measure that one does not understand as a sum 
of measures one does understand, then study the summands, and finally draw 
conclusions about the original measure. This plan of attack will work best when 
the summands do not ‘overlap’. 


Definition 11. Measures u and v on a measure space (N, F) are mutually 
singular if there exists a set A € F such that (A) = v(A‘) = 0. 


Definition 12. A measure that can be written as a countable sum of pairwise 
mutually singular finite measures is said to be o-finite. 


Proposition 13. Let (Q,F,) be a measure space. Then p is o-finite if and 
only if there exists a pairwise disjoint sequence (B;: j = 1,2,...) of members of 
F such that u(B;) < œ for every j and 


u(B) = >> u(BOB;) 


j=l 


for every BE F. 


Problem 20. Prove the preceding proposition. 


84 6. CALCULATING PROBABILITIES AND MEASURES 


Example 1. [Lebesgue measure on R] For each integer j, let Aj be the uni- 
form probability measure on the interval [j,7 + 1), with each A; taken to be a 
probability distribution on R. Thus, A; gives 0 measure to the set [j, j +1)°, and 
measure 1 to the set [j,7 + 1), so we have defined a pairwise mutually singular 
collection of finite measures on the Borel sets of R. For any Borel set B C R, 
let ACB) = i jez Aj(B). A straightforward calculation shows that A(Z) = the 
length of J for any interval J. The o-finite measure A is called Lebesgue measure 
on R, or one-dimensional Lebesgue measure. 

If A is a Borel subset of R, then the restriction of À to Borel subsets of A is 
called Lebesgue measure on A. If 0 < A(A) < oo, then we obtain the uniform 
distribution on A by dividing Lebesgue measure on A by A(A). 


Example 2. [Counting measure] Let 2 be an arbitrary set and F the o-field 
of all subsets of Q. For B C Q, let p(B) be the cardinality of B if B is finite; 
otherwise, let (B) = oo. It is easily seen that u is a measure, and that p is 
o-finite if and only if Q is countable. The measure p is called counting measure 
on Q. If Q is finite and nonempty, then we obtain the uniform distribution on N 
by dividing the counting measure u by (Q). 


Although most of the concepts and theorems for probability spaces can also 
be extended to a-finite measure spaces, and, in many cases, to general measure 
spaces, some cannot. Thus, care must be taken with infinite measure spaces, 
even o-finite ones. For instance, let A, = (n,oo),n = 1,2,..., and let A be 
Lebesgue measure on (R,B). Then A, > @ as n > ov, but, contrary to what 
one might expect based on the Continuity of Measure Theorem, A(A,) = co for 
all n, and hence does not go to 0. We will have more to say about the similarities 
and differences between finite and infinite measure spaces in subsequent chapters, 
particularly in Chapter 8. 


Problem 21. State and prove finite and o-finite versions of the Inclusion-Exclusion 
Proposition. 


Problem 22. Show that every translation invariant o-finite measure on Z (with the 
o-field of all subsets) is a multiple of counting measure. 


CHAPTER 7 


Measure Theory: 
Existence and Uniqueness 


In some of the examples of previous chapters, most notably the coin-flip space, 
we defined a sample space Q and a o-field F, but we did not completely specify 
probabilities P(A) for all A € F. Instead, we only gave the values of P(A) for 
events A in a smaller collection € such that F = o(€), and then we assumed 
without proof that P could be extended in a unique way to all of F. In this 
chapter, we close this gap by showing that under certain natural assumptions, a 
function P defined on a collection £ of subsets of a sample space Q can be ex- 
tended in a unique way to a probability measure on F = o(€). Once probability 
measures are constructed, we can piece them together to form o-finite measures. 
Interesting examples include Lebesgue measure in RÊ and a certain measure on 
the space of all lines in R?, both of which are invariant under rigid motions. In 
preparation for this chapter, the reader may want to review (Counter)example 6 
in Chapter 1 and to reread the paragraphs leading up to that example and to 
Definition 5 of the same chapter. 


7.1. The Sierpinski Class Theorem and uniqueness 


The uniqueness question can be resolved without much difficulty, so it is treated 
first. This question may be formulated as follows: Given a probability space 
(Q,0(€),P), what conditions on E ensure that the values of P(A) for A € E 
uniquely determine the values of P(A) for all A € o(€)? The following definition 
and theorem will set the stage for an answer to this question. The term proper 
set difference, referring to a set difference B \ A, entails the assumption that 
A C B. (Some use the notation B — A for a proper set difference.) 


Definition 1. A Sierpiński class of subsets of a set Q is a class that is closed 
under limits of increasing sequences of sets and proper set differences. 


Problem 1. Let E€ be a collection of subsets of a set Q. Prove that there is a 
smallest Sierpinski class of subsets of Q that contains £. 


86 7. MEASURE THEORY: EXISTENCE AND UNIQUENESS 


Problem 2. Show that if a Sierpinski class of subsets of Q is closed under pairwise 
intersections and contains 2, then it is a o-field. 


Theorem 2. [Sierpiński Class] Let € be a collection of subsets of a set Q and 
suppose that E is closed under pairwise intersections and contains Q. Then the 
smallest Sierpinski class of subsets of Q that contains E equals a(€). 


PROOF. Let D denote the smallest Sierpinski class containing £. Clearly, 
D C o(€). Since Q € E and E C D, it follows from Problem 2 that to show 
a(E) C D, we need only prove that D is closed under pairwise intersections. 

For A C Q, define 


(7.1) Na ={B: ANB E€ D}. 


It is easy to check that Ma is a Sierpiński class for any A € D. Consider M4 
for A € €. Since € is closed under pairwise intersections, Ma D €. Therefore, if 
A € £, Ng is a Sierpiński class that contains D. 

Now consider M4 for A € D. By the preceding paragraph, the intersection 
of A with any member of E is a member of D. Hence, Ma D E. We again 
conclude that M4 is a Sierpinski class that contains D. Therefore, D is closed 
under pairwise intersections. O 


Theorem 3. [Uniqueness of Measure] Let P and Q be probability measures 
on the measurable space (Q,a(E)), where E is a collection of sets closed under 
pairwise intersections. If P(A) = Q(A) for every A € E, then P = Q. 


* Problem 3. Prove the preceding theorem. 


Problem 4. Show how the Uniqueness of Measure Theorem can be used to prove 
the uniqueness assertion of Proposition 4 of Chapter 3. Incorporate some of Prob- 
lem 13 of Chapter 1 into your discussion. 


Problem 5. Which of the collections given in Problem 13 of Chapter 1 are closed 
under pairwise intersections? 


Problem 6. Verify the assertions of uniqueness in Example 1 and Problem 15, both 
of Chapter 2. 


The Uniqueness of Measure Theorem assures us that we can describe a prob- 
ability measure in an unambiguous manner by giving its values for a relatively 
small collection of events, as long as this collection is closed under pairwise inter- 
sections. It does not, however, say that probabilities for sets in such a collection 
can be specified arbitrarily. For instance, consider the following collection of 
subsets of Q = {1, 2,3}: 


{9, {1}, {1, 2}, {1, 3}, Q} : 


7.2. FINITELY ADDITIVE FUNCTIONS DEFINED ON FIELDS 87 


This collection is closed under intersections and generates the o-field F of all 
subsets of Q. Nevertheless, there is no way to extend the function R, defined by 


R(Q) = R({1,2}) = R({1,3})=1, RY) =1/2, RO) =0, 


to a probability measure on F. It is easy to check in this example that the 
function R does not violate the conditions required of a probability measure on 
E. The problem is that while € is large enough to ensure uniqueness, it is not 
large enough to ensure existence. 


7.2. Finitely additive functions defined on fields 


In general, there are two parts to the existence problem. The first is to find a 
tractable way to check that a function R satisfies all the properties of a prob- 
ability measure on its domain of definition €. The second part is to find a set 
of conditions on € that will guarantee that if R satisfies all the properties of a 
probability measure on €, then R can be extended to a probability measure on 
a(E). Some of the work done in Chapter 6 concerning sequences of events will 
be useful here, although we will need to extend it to a more general setting than 
that of o-fields. 


Definition 4. A field of subsets of a set Q is a collection of subsets of 2 that 
has @ as a member and is closed under complementation and pairwise unions. 


Definition 5. A real-valued function R defined on a field € of subsets of a 
set Q is said to be finitely additive if R(AU B) = R(A)+ R(B) for every disjoint 
pair A and B of members of €. The function R is said to be countably additive 
if 


R(Ú An) = 3 R(An) 


whenever (A1, Ao,...) is a sequence of pairwise disjoint members of € whose 
union is also a member of €. 


Problem 7. Let R be a nonnegative finitely additive function defined on a field £ 
of subsets of a space 2, such that R(Q) = 1. Let A and B be members of € such 
that A C B. Prove that R(@) = 0, R(A) < R(B), R(B \ A) = R(B) — R(A), and 
R(AS) = 1 — RA). 


Problem 8. Show that if R is countably additive, then 


R(Ù An) < 57 R(An) 


n=l 


whenever (Aj, A2,... ) is a sequence of members of € whose union is also in £. 


88 7. MEASURE THEORY: EXISTENCE AND UNIQUENESS 


The preceding problem shows that if R is countably additive, it possesses 
many of the properties associated with probability measures, even if its domain 
is only a field. It will be seen in the forthcoming lemma that such a function 
R automatically possesses another important property of probability measures, 
namely continuity of measure. Thus, this lemma is a strengthening of the Con- 
tinuity of Measure Theorem of Chapter 3. It is recommended that the reader 
attempt the exercise preceding the lemma before reading the proof of the lemma. 


Problem 9. Prove the following lemma under the additional assumption that the 
sequence (A1, Ao,...) is decreasing. Hint: Review the proof of the Continuity of 
Measure Theorem of Chapter 3. 


Lemma 6. Let E be a field of subsets of a set 2, and let R be a nonnegative 
countably additive function defined on E such that R(Q) = 1. Let (Ay, Ao,...) be 
a sequence in E with the property that limp,.An =@. Then limn+. R(An) = 
0. 


PROOF. For m = 1,2,..., the function 
p 
pm R( U An) 
n=m 


increases to a limit um as p > oo. Fix € > 0. Choose integers N (m), increasing 
to œ as m increases to oo, such that 


E 
Vi R(Bm) < am ; 

where for each m, 

N(m) 
(7.2) Bae |] An 

n=m 
Clearly, Bm — @ as m > ov. 

For k = 1,2,..., let 
k 

(7.3) C= () Ba 

m=1 


Since Ck N 0 as k 7 œ, it follows from Problem 9 that R(C,) > 0. By (7.2) 


7.2. FINITELY ADDITIVE FUNCTIONS DEFINED ON FIELDS 


and (7.3), 


Since R(Ck) —> 0 and R(Ax) < R(Ck) + R(Ax \ Ck), we conclude that 


lim sup R(Ak) S€. 


ko 


Now lete \, 0. O 


89 


Lemma 7. Let E€ be a field of subsets of a set Q and R a nonnegative count- 
ably additive function defined on E such that R(Q) = 1. Let (Aj, Ao,...) and 
(Bı, Bo,...) be sequences in E whose limits exist and are equal (the common 
limit need not be a member of E). Then limp. R(An) and limn_.. R(Bn) 


exist and are equal. 


PROOF. It is sufficient to show that if (mı < mə < ...) is any strictly in- 
creasing sequence of positive integers, then R(A,) — R(Bm„) > 0 as n > oo. 


By Problem 7, 


(7.4) |R(An) — R(Bm,)| < R(An U Bm,) — R(An N Bm, ) 
= R((An U Bm, ) \ (An N Bm, )) - 
By Problem 8 of Chapter 6, 
lim (An U Bn \ArABn =M. 
It follows from Lemma 6 that 
im R((An U Bm,) \ (AnN Bm, )) 0: 


Therefore, by (7.4), R(An) — R(Bm,,) > 0, as desired. O 


Corollary 8. Let E and R be as in the preceding lemma. If (Aj, Ao,.. 


a sequence in E that converges to A € E, then R(A) = limn_,.. R(An). 


* Problem 10. Prove the preceding corollary. 


.) is 


90 7. MEASURE THEORY: EXISTENCE AND UNIQUENESS 


We have seen a variety of conclusions that follow from countable additivity. 
The following proposition provides a useful condition under which finite additiv- 
ity implies countable additivity. 


Proposition 9. Let R be a nonnegative finitely additive function defined on 
a field E of subsets of a set Q, such that R(Q) = 1. Then R is countably additive 
if and only if R(A,) > 0 for every decreasing sequence (A1, A2,...) in E for 
which lim A, = 9. 


PROOF. The ‘only if’ is a special case of Lemma 6 (see also Problem 9). To 
prove the ‘if’ part, let (B1, Bo,...) be a sequence of pairwise disjoint members 
of € whose union B is also a member of £. Let 


a 
m=1 


Then (A1, A2,...) is a decreasing sequence in E whose limit equals Ø. Hence 
R(A,) > 0. It follows that 


R(B) - 9 R(Bm) = lim (Rw) -5 R(Bn)) 


Il 
Š? 
ae 
ATTEN 
=) 
d 
| 
y 
ATN 
Cus 
ve 
3 
ee 
Se ae” 


I 
5 
D 
D 
Sy 


Therefore, R is countably additive. O 


7.3. Existence, extension, and completion of measures 


Here is a typical situation in which the existence question arises. We start 
with a field £ of subsets of a space N, and we are able to specify probabilities 
R(A) for sets A that are members of €. It usually happens naturally that R 
is nonnegative and finitely additive, and that R(Q) = 1. If we can verify the 
condition in Proposition 9, then we know that R is countably additive. We will 
now prove that if R is countably additive, it can be extended to a probability 
measure defined on a(€). 

The proof requires several steps which we briefly describe here. First we let 
E, be the collection of subsets of Q which are limits of sequences of members of 
E. The collection £; is easily shown to be a field which contains E. Next we use 
Lemma 7 and Proposition 9 to show that R can be extended to a nonnegative 
countably additive function defined on E1. The procedure is then repeated, with 
£; in the place of £. The result is that R is extended to a nonnegative countably 
additive function defined on a field £> which contains all limits of sequences 
of members of £1. We could continue to repeat this procedure, successively 
extending R to fields €3,&4,.... However, it develops that the collection E> 


7.3. EXISTENCE, EXTENSION, AND COMPLETION OF MEASURES 91 


plays a special role. We introduce a collection D of subsets of 2 called the 
completion of Ez. A set B is a member of D if there exist sets A and C in E2 
such-that A C B C C and R(A) = R(C). We show that D is a o-field which 
contains o(€). Finally we extend the function R to D in a natural way and show 
that the extension is countably additive. 


Lemma 10. Let E be a field of subsets of a space Q. Define E, to be the 
collection of subsets of Q which are limits of sequences of members of E. Then 
E, is a field which contains E. 


* Problem 11. Prove the preceding lemma. 


Let € and €, be as in Lemma 10, and let R be a nonnegative countably 
additive function defined on E such that R(Q) = 1. For A € €; and (Aj, Ao,...) 
a sequence in € whose limit is A, define 


R,(A) = lim R(A,). 
TL CO 


It follows from Lemma 7 that this limit exists and does not depend on the 
choice of the sequence (A1, A2,...). If A € E, it follows from Corollary 8 that 
R,(A) = R(A). Thus R, is an extension of R to £. It is clear that R, is 
nonnegative and R,(Q) = 1. 


Problem 12. Show that Rı is finitely additive. Hint: If A and B are disjoint 
members of €;, show that there are sequences (Ai, A2,...) and (Bi, Bo,...) in E, 
with limits A and B, respectively, such that for each n = 1,2,..., An and Bn are 
disjoint. 


Lemma 11. The extension Rı is countably additive. 


PROOF. By the preceding problem, R; is finitely additive. Thus, by Proposi- 
tion 9, it is enough to show that if (A1, A2, ... ) is a decreasing sequence in £; that 
converges to 9, then Rı(An) > 0 as n > ov. By the definition of €,, for each m 
there exists a sequence (By n,n = 1,2,...) in E such that Am = limp Bm,n- 
Let 


m 
Cin = N Byin . 
j=l 


Then since the sequence (Aj, Ag,...) is decreasing, 


Jim, Cna = im (N Bin) 


92 7. MEASURE THEORY: EXISTENCE AND UNIQUENESS 


By the definition of Ri, Ri(Am) = limnsoR(Cmn), so form =1,2,... we can 
choose integers Nnm increasing to oo as m increases to oo, such that 


(7.5) Ri (Am) — R(Cm.n,) > 0 as m > oo. 


To finish the proof that Ri(Am) > 0 we will show that R(Cm.n,,) > 0 by 
demonstrating that Cm,n„ > 9 and appealing to Lemma 6. Since for fixed n the 
sequence (Ck.n,k = 1,2,...) is decreasing, we can conclude that for each fixed 
k, 


lim sup Cm n,, C liM SUP Ck.n,, = Ak- 
m> m> oo 


Now let k + œ to obtain lim sup Cyn, = 0, as desired. O 


It is clear that the procedure of the preceding two lemmas can be repeated 
to produce an increasing sequence of fields €,,, with corresponding nonnegative 
countably additive functions Rn defined on En. The field €,41 is obtained by 
taking all limits of sequences of members of En, and Rn+1 is an extension of Rp. 
Each of the fields €, contains £, and each Rn extends R to En. 


Problem 13. Consider the collection of intervals in R of the following four types: 
(a, b] for a < b, (—0o, 8], (a, oo), and (—0o0, co). Let E be the collection consisting 
of all finite unions of such intervals. We include the empty set in € as the union of 
an empty collection of intervals. Show that € is a field. To which of the fields E, 
described above do the finite subsets of R belong? What about the set of rational 
numbers? The open sets? The set [0, 1)? 


It is possible for each field €,41 to contain new sets that are not members of 
En, so that in general, o(€) is strictly larger than any of the fields En. In fact, in 
many important cases, o(€) is strictly larger than the union of all the fields En. 
Fortunately, as the following lemma indicates, there is a sense in which nothing 
essentially new appears after the extension to £2. 


Lemma 12. Let B € o(€). Then there exist A,C € E such that AC BCC 
and Rj(A) = Ro(C). 


PROOF. Let G be the collection of all sets that are the limits of decreasing 
sequences of members of €, and let H be the collection of all sets that are the 
limits of increasing sequences of members of €. Note that G and H are contained 
in €,, so R, is defined on G and H. Define 


D={B:Ve>0,4IGEG,HEH 
such that G C B C H and R,(A) — Ri (G) <e}. 


It is clear that the conclusion of the lemma holds for every B € D and that 
€ C D. Accordingly, we can finish the proof by showing that D is a Sierpinski 
class. It is easy to see that it is closed under proper set differences. All that 
remains is to show that it is closed under increasing limits. 


7.3. EXISTENCE, EXTENSION, AND COMPLETION OF MEASURES 93 


Suppose that Bm Z B as m Z œ and Bm € D for each m. Let € > 0. For 
each positive integer m, choose and Gm € G and Hm E H so that: 


Gm C Bm C Hm and Ri(Hm)—- Ri(Gm) < amt * 


Let z 
HS) Hye: 
m=l1 


Since a countable union of members of H is itself a limit of an increasing sequence 
of members of E, the set H is a member of H. By Lemma 7 applied to Rj, 


Rı(H) = lim Ry ( U Hm) l 
Choose n large enough so that 
$ E 
Rı(H) = Ry (U Hm < 9 ’ 


and let P 
G= Gas 
m=i 


Any finite union of members of G is also a limit of a decreasing sequence of 
members of €, so the set G is a member of G. Note that since the sequence 
Bı, Bo,..., is increasing, 

GCB, CB. 


The definition of H ensures that B C H, so G C BCH. Also note that 


GoU An oH, 


m=1 


so finite additivity implies that 


Ry(H\G) = Ri (#\ U Hm) + Ri (U Hm \ U Gm 


£ n 
Sp 2 Nae 


It follows that B is a member of D as desired. O 


Problem 14. Let Q be a probability measure defined on (Rt, B). A set B is called 
regular for Q if for each € > 0, there exists a compact set K and an open set O 
such that K C B C O and Q(O \ K) < e. Show that every set in the completion 
of the Borel o-field is regular. Hint: Mimic the proof of Lemma 12 to show that 
the collection of sets that are regular for Q is a Sierpitiski class. Then show that 
every open rectangular box is regular for Q. 


94 7. MEASURE THEORY: EXISTENCE AND UNIQUENESS 


Lemma 12 motivates the following definition. 


Definition 13. Let £ be a field of subsets of a space Q, and let R be a 
nonnegative countably additive function defined on E such that R(Q) = 1. The 
completion of € with respect to R is defined to be the collection of all sets B C Q 
such that there exist sets A and C in € satisfying A C B C C and R(C'\ A) = 0. 


Lemma 12 says that o(€) is contained in the completion of E> with respect to 
Rə. We finally have all the ingredients that we need for the existence theorem, 
which may be more properly called the ‘Extension Theorem’. 


Theorem 14. [Extension] Let E be a field of subsets of a space 2 and R a 
nonnegative countably additive function defined on E such that R(Q) = 1. Then 
there exists a unique probability measure P defined on o(€) such that P(A) = 
R(A) for every AE E. 


PROOF. Since € is a field, it is closed under pairwise intersections. Thus the 
uniqueness assertion follows from the Uniqueness of Measure Theorem. 

Let Rə be the extension of R to £2, as in the paragraph preceding Problem 13. 
By Lemma 12, o(€) is in the completion of E> with respect to Rə. Thus, for each 
set B € o(€) we can choose sets A and C which are members of Ez such that 
AC BCC and R2(C \ A) = 0. Assuming such a choice has been made, define 
P(B) = R2(A). That P(B) is well-defined follows from the fact that for any 
fixed choice of C, Ro(A) = R2(C) independently of the choice of A. It follows 
in particular that P(B) = R(B) for Be €. 

It remains to show that P is a probability measure. Clearly P is nonnegative 
and P(Q) = 1. We now show that P is finitely additive. Let Bı, By be disjoint 
members of o(€). There exist sets Ay, A2,C1,C2 E€ E such that A; C Bi C C; 
and P(B;) = Ro(A;) = Re(C;) for i = 1,2. Since A; and Ap are clearly disjoint, 


Rə(41 U Ag) = Ro(Ay) + Ro(A2) = P(B1) + P(Bo) 


by the finite additivity of Rə. So to prove the finite additivity of P, it is enough 
to show that R2(A; U A2) = P(Bı U B2). But this last equality follows from the 
definition of P and the fact that 


Rə((C1 U C2) \ (41 U A2)) < R2((C1 \ 41) U (C2 \ 42)) 
< Ra(Ci \ 41) + Re(Ce2 \ Ao) = 0. 


To show that P is a probability measure, it remains to show that P is count- 
ably additive. We will apply Proposition 9. Let (B1, B2,...) be a decreasing 
sequence in o(€) that converges to 0. By the definition of P, there exists a se- 
quence (Aj, Á2,...) in € such that A, C Bn and P(B,) = Ro(An) for each n. 
Clearly A, —> Í as n > œœ. So, by Lemma 6, R2(A,) > 0. Thus, P(Bn) > 0, 
and the countable additivity of P follows from Proposition 9. O 


7.4. EXAMPLES 95 


Problem 15. Let (Q, F, P) be a probability space. Show that the completion of F 
with respect to P is a o-field, and that P can be extended uniquely to this o-field. 
(This extension is called the completion of P.) 


7.4. Examples 


The following example shows how to apply the Extension Theorem to a general- 
ization of the coin-flip probability space, a generalization in which ‘biased’ coins 
are considered. For the special case of a fair coin, it establishes that there really 
is a probability space as described in Example 4 of Chapter 1 for the experiment 
of flipping a fair coin an infinite number of times. An approach that avoids 
topology is given in the proof of Theorem 16 in Chapter 9. 


Example 1. Let 2 equal the set of infinite sequences w = (w1,w2,...), where 
each w; € {0,1}. Thus, Q is the product of countably many copies of the set 
{0,1}. We let Q have the product topology, where each copy of {0,1} is given the 
discrete topology. Since the topological product of compact spaces is compact, 
Q is a compact topological space. Let € be the collection of all sets of the form 


(7.6) {w: (W1,W2,--.,Wn)E Cn} for Cn C {0,1}”. 


It is easily seen that £ is a field, and that every set in £ is a closed set. Since 
Q is a compact set, it follows that every member of € is compact. By the finite 
intersection property of compact sets, if A,,A2,... is any sequence of members 
of € that decreases to 9, then A, = Q for all sufficiently large n. It follows from 
Proposition 9 that if R is any nonnegative finitely additive function defined on €, 
such that R(Q) = 1, then R is countably additive. By the Extension Theorem, 
any such R can be extended to a probability measure on the o-field o(€). (It 
is not hard to prove that every open subset of 2 can be written as a countable 
union of members of E, so in fact, o(€) is the o-field of Borel subsets of Q.) 

We will construct a class of nonnegative finitely additive functions defined on 
E, such that R(Q) = 1. The model for repeated flips of a fair coin is a special 
case. Let pi,p2,... be a sequence of real numbers in the interval [0,1]. For 
Cn C {0,1}”, set 


Rw: iran) E= E a- pm) 
(wisn ECCh m=l1 


(In this formula, we understand 0° to equal 1.) There is an apparent ambiguity 
in this definition, which arises from the fact that any set which can be expressed 
in the form (7.6) can also be written in the form 


fw: (Wy,..-,;Wn41) E Diab, 


96 7. MEASURE THEORY: EXISTENCE AND UNIQUENESS 


where D,,4, is the set of members of {0,1}"*! whose first n coordinates form 
a sequence of 0’s and 1’s which lies in Cn. That there is no real ambiguity is 
shown by a straightforward computation which is left to the reader. 

If A and B are two disjoint members of €, then they can be written in the 
form (7.6) using a common value of n. The summation in the definition of R 
now makes the finite additivity of R clear. 

The particular case in which pp = 1/2 for each n gives the coin-flip probability 
space of Example 4 of Chapter 1. For the general case, we can think of an 
experiment in which a different coin is tossed at each stage, with pn being the 
probability that the result of the n'® toss is heads. We will have more to say about 
the general case in Chapter 9, when the notion of ‘independence’ is introduced. 


Problem 16. Suppose, in the preceding example, that each p, equals a common 
value p. Let X, denote the number of heads in the first n coin flips. Calculate and 
name the distribution of Xn. 


* Problem 17. Suppose, in Example 1, that pn = n~° for some 8 > 0. Calculate 
the probability that at least two of the first four flips are heads. For n = 1,2,..., 
let A, be the event that the nt? flip is heads. Describe in terms of coin flips the 
events liminf A, and lim sup Ay, and calculate their probabilities. Let 


Y(w) =inf{n:w E€ An} and Z(w) =sup{n: w € An}. 


Calculate the distributions of Y and Z, and describe in terms of coin flips what 
these random variables represent. 


In Example 1 of Chapter 2 we showed how to use the coin-flip model to 
construct a probability space (Q, F, P) in which Q is the unit interval [0,1] and 
P is Lebesgue measure. In Proposition 4 of Chapter 3 we used this probability 
space (Q, F, P) to construct all possible distributions on R. Example 1 puts this 
work on a solid theoretical foundation by showing that the coin-flip probability 
space exists. Nevertheless, a more direct approach to constructing distributions 
on R may be instructive. 


Example 2. Let F be an arbitrary distribution function for R, and let € be 
the field of subsets of R described in Problem 13. For -co < a < b < œ, set 


R((a,b]) = F(b) — F(a) R((-00,b]) = F(b) 
R((a,oo)) =1—Fi(a) and R((—co,00)) =1. 


If A,,...,A, are disjoint intervals which are members of £, then we define 


R (Ù a) = X` RA): 
k=1 k=1 


7.4. EXAMPLES 97 


The key to showing that this definition is unambiguous is the following calcula- 
tion, together with others similar to it involving unbounded intervals: 


R((a, b]) + R((b, ¢]) = (F(b) — F(a)) + (F(c) — F(b)) = F(e) — F(a) = R((a, ¢]) . 


Thus R is a well-defined nonnegative function defined on € such that R(Q) = 1. 
Finite additivity is immediate from the definition of R. 

The field € generates the Borel o-field B of subsets of R. To show that R can 
be extended uniquely to a probability measure on B, it is enough to show that 
R is countably additive on €. By Proposition 9, this is equivalent to showing 
that R(A,) > 0 for each decreasing sequence (Aj, A2,...) of members of E such 
that lim A,, = Ø. The reader is asked to prove this fact in the following exercise. 


Problem 18. Let £ and R be as in the preceding example. Prove that R is count- 
ably additive on £. Hint: Fix £ > 0 and show that for each set Ax in the decreasing 
sequence (Aj, Ao,...), there exists a compact set Cy and a set B, E E such that 
By C Cy C Ap and R(Ax \ Bk) < ¢/2*. Since A, — 9, the finite intersection 
property implies that 


for all sufficiently large n. It follows for such n that 


An = | J(An \ Br) © UJ (As \ Bi). 
k=1 


k=1 


Conclude that R(A,) < € for all sufficiently large n. 


Example 2 shows that Lebesgue measure on [0,1] exists, and Example 1 of 
Chapter 6 shows how to use that fact to construct Lebesgue measure on R. The 
following example explores an important property of Lebesgue measure on R. 


Example 3. Let u be Lebesgue measure on R. For x € R and A C R, let 
A+ zx denote the set {y +x: y € A}. The continuity of addition and subtraction 
implies that A € B if and only if A+ zx € B for all x € R. We want to prove 
that (A) = w(A4+ x) for all A € B and z € R. This property of p is called 
translation invariance. Fix x € R, and let E be the class of all sets A € B such 
that (A) = (A + x). It is easy to check that E is a Sierpiński class. If I 
is an interval, (I) = (I + x) since J and I + a have the same length. Thus 
E€ contains the collection of all intervals. Since the collection of all intervals 
is closed under pairwise intersections, contains R, and generates B, it follows 
from the Sierpinski Class Theorem that € = B. Thus, one-dimensional Lebesgue 
measure is translation invariant. 


98 7. MEASURE THEORY: EXISTENCE AND UNIQUENESS 


Problem 19. Show that if u is a translation-invariant measure on (R, B) such that 
u({0,1)) = 1, then pu is one-dimensional Lebesgue measure. 


Problem 20. Show that if u is a translation invariant measure on (R, B), then 
either (I) = œ for all nonempty open intervals J, or p is a finite multiple of 
one-dimensional Lebesgue measure. 


Example 4. Let Q = [0,1) x [0,1). Consider the collection of rectangular 
boxes of the form [a, b) x [c, d), and let € be the collection of all finite unions of 
such boxes. For A € £, define R(A) to be the area of A. We leave it as a problem 
to show that € is a field, and that R is a nonnegative countably additive function 
defined on €, such that R(Q) = 1. Thus, R can be extended to a probability 
measure P defined on the Borel subsets of Q. This probability measure was 
introduced in Problem 10 of Chapter 2 as the uniform distribution on [0,1)?. 
An alternative construction of P will be given in Chapter 9. 


Problem 21. For the preceding example, prove that E is a field, and that R is a 
nonnegative countably additive function such that R(Q) = 1. Hint: To prove that € 
is closed under complementation, first show that the complement of a rectangular 
box can be written as the union of at most four disjoint rectangular boxes. To 
prove countable additivity, follow the hint given in Problem 18. 


Example 5. [Lebesgue measure on R?] Let 2 = R? and let B be the Borel 
o-field of Q. For integers m and n, let Am n = [m,m + 1) x [n,n + 1), and 
let {m,n be the uniform distribution on Am,n, defined according to the pattern 
given in Example 4. Think of each measure um,n as a probability measure on 
(R, B). For B € B, set 


The o-finite measure p is called two-dimensional Lebesgue measure, or Lebesgue 
measure on R°. 


Problem 22. Prove that 2-dimensional Lebesgue measure is translation invariant. 
Also prove that 2-dimensional Lebesgue measure is unique in the same sense in 
which 1-dimensional Lebesgue measure is unique (see Problem 20). 


Problem 23. Prove that 2-dimensional Lebesgue measure is rotation invariant. In 
other words, if T: R? > R? is any rotation of R? and if B is any Borel subset of 
R?, show that u(B) = u(T(B)), where p is 2-dimensional Lebesgue measure. Hint: 
In view of Problem 22, it suffices to consider rotations about the origin. 


7.4. EXAMPLES 99 


Example 6. Let £ denote the set of all lines in R?. We establish a one-to-one 
correspondence between £ and 


S={(s,¢):sERV<o<7} 


as follows. The line corresponding to (s,@) is that line whose distance from the 
origin equals |s| and which meets a certain ray emanating from the origin at a 
right angle, that ray being the one with direction ¢ if s > 0 and the one with 
direction 6+ 7 if s < 0. Figure 7.1 illustrates this correspondence for a certain 
infinite set of lines, namely those that intersect a given segment. The shaded 
region in the (s, ¢)-plane corresponds to that infinite set of lines, and the points 
marked in the (s,¢)-plane correspond to the lines drawn in the right half of the 
figure. 


FIGURE 7.1. A subset of the space of lines 


We give S the relative topology inherited from the usual topology on R’, 
and let B be the o-field of Borel subsets of S. Let u be the restriction of 2- 
dimensional Lebesgue measure to B. The one-to-one correspondence between S 
and £ just described serves to turn £ into a topological space and produces a ø- 
finite measure space (L, A, v), by transferring the corresponding structures from 
S. The following exercises concern properties of the measure space (L, A, v). 


* Problem 24. Let (£,A,v) be the measure space of the preceding example. Show 
that v is invariant under rigid motions in R*. You may use the fact that every 
rigid motion in R? is a composition of rotations about the origin and translations. 


Problem 25. For the setting of the preceding problem, find the measure of the set 
of lines that intersect a fixed but arbitrary line segment. Hint: Use the preceding 
problem to show that it is sufficient to consider the case shown in Figure 7.1. 


100 


* 


7. MEASURE THEORY: EXISTENCE AND UNIQUENESS 


Problem 26. Let (£,A,v) be as in the preceding exercise. Let A be the set of 
lines that intersect a fixed convex polygon. Calculate v(A). Hint: First prove 
that v(A) is finite, either by a direct method or by using the preceding problem. 
Let Q = A and let B be the o-field of Borel subsets of A. For B € B, define 
P(B) = v(B)/v(A), so that (Q, B, P) is a probability space. For each line w € Q, 
let X(w) equal the number of intersections of w with the polygon. Calculate E(X) 
in two ways, one using the answer to the preceding problem. 


Problem 27. Let (£,.A,v) be as in the preceding exercise. Let A be the set of lines 
that intersect a fixed circle of radius r. Calculate v(A). As in the preceding prob- 
lem, turn A into a probability space, and for that space, calculate the probability 
that a line in A intersects a given second circle of radius s < r inside the first circle. 


Problem 28. In the preceding exercise, a random line intersecting a circle is de- 
fined. Such a random line determines a random chord of the circle. Is this random 
chord equivalent to any of the interpretations given in Example 3 of Chapter 2? 


Problem 29. Let C be a circle, and let D be a curve contained in the interior of 
C. Assume that D consists of straight line segments (possibly infinitely many), 
and that D does not intersect itself. Let w be a random line intersecting C, as 
in the preceding two problems, and let X(w) be the number of intersections of D 
and w. Calculate E(X) in terms of the radius r of the circle C and some constant 
associated with D. 


CHAPTER 8 
Integration Theory 


In this chapter we have three main goals. The first is to extend the concept 
of expectation to general measure spaces. In this context, we do not use the 
phrase “expectation of a random variable”, but instead we introduce the ‘inte- 
gral’ of a measurable function. This new kind of integral is called the ‘Lebesgue 
integral’. Our second goal is to introduce several tools, which along with the 
Monotone Convergence Theorem, are valuable for interchanging limit opera- 
tions with expectation and integration. Our third goal is to explore some of the 
similarities and differences between Lebesgue integration and Riemann integra- 
tion. As in the case of expectations, the Riemann-Stieltjes integral will be useful 
for computations of Lebesgue integrals. Also useful in such calculations is the 
‘Radon-Nikodym derivative’, introduced near the end of the chapter. 


8.1. Lebesgue integration 


The definition of the Lebesgue integral in a general measure space is completely 
analogous to the definition of the expectation in a probability space. Neverthe- 
less, we quickly repeat here the steps in that definition, both to establish some 
new notation, and also to provide a natural opportunity for review. 

We start with measurable simple functions. However, since measurable sets 
can have infinite measure in a general measure space, we find it easiest to restrict 
our attention at first to nonnegative measurable simple functions. 


Definition 1. Let f be a measurable function defined on a measure space 
(Q, F, u) and having the form 


f a > cjle; 
j=1 


for some nonnegative constants c; and measurable sets Cj. Then the Lebesgue 


102 8. INTEGRATION THEORY 


integral of f equals 


and is denoted by 


The reader may note that we did not assume in this definition that the con- 
stants cj; are distinct, nor did we assume that the sets C; form a partition of Q, 
in contrast to Definition 1 of Chapter 4. The reason is that we already know 
from our work in Chapter 4 that no ambiguity can result from this relaxing of 
assumptions (see Problem 4 of Chapter 4). 

Going from nonnegative measurable simple functions to arbitrary measurable 
functions is just as before. The fact that the following definition is unambiguous 
is proved just as in Chapter 4. 


Definition 2. Let f be a measurable function defined on a measure space 
(0,F, p). If f is R -valued, then 


J fdu = sup f hdn, 
h 


where the supremum is taken over all nonnegative measurable simple functions 
h such that h < f. If f is R-valued, then 


[tea fian- |F dn, 


provided the two integrals on the right side of this expression are not both 
infinite; otherwise, f f dp does not exist. When it exists, f f du is called the 
Lebesgue integral of f with respect to p. 


It should be clear to the reader that the Lebesgue integral is indeed a gener- 
alization of expectation; that is, if 4 happens to be a probability measure, then 
f fdu = E,(f). In fact, as the following result shows, if y can be written as a 
countable sum of finite measures, then f f du can be expressed directly in terms 
of expectations. 


Proposition 3. Let (0,7, pu) be a measure space such that p(Q) > 0. Sup- 
pose there exist finite measures uj, j = 1,2,..., on (Q,F) such that p(A) = 
ye yj(A) for all A € F. We may assume without loss of generality that 
p;(Q) > 0 for each j, so that the formula 


P(A) = pj (A)/uj(Q) for A € F 


8.1. LEBESGUE INTEGRATION 103 


defines probability measures P; on (Q,F). Then for all measurable functions 
a+ 
f:Q9R , 


(8.1) f tau = Y OERA): 


PROOF. We leave it to the reader to check that (8.1) holds for nonnegative 
measurable simple functions f. It follows immediately that the left side of (8.1) 
is less than or equal to the right side for all measurable R` -valued functions 
f. On the other hand, for any integer n and nonnegative measurable simple 
functions fj < f,j =1,...,n, 


1<j<n <j<n 


DME) SS w(YEr,( sup f) < f sup fran, 
j=l q=1 


since the supremum of finitely many nonnegative measurable simple functions is 
a nonnegative measurable simple function. By the definition of the integral, it 
follows that 


SOEH) S | fay, 


Take the supremum over all nonnegative measurable simple functions f; < f to 
obtain 


S OER) < | fdp. 
j=l 


Now let n go to oo to see that the left side of (8.1) is greater than or equal to 
the right side. It follows that the two sides are equal. O 


The preceding result makes it easy to generalize many of the results about 
expectation to Lebesgue integration on o-finite measure spaces. The following 
result often allows us to go beyond o-finite measure spaces to general measure 
spaces. 


Proposition 4. Let (0,7, u) be a measure space, and let f be an R` -valued 
measurable function defined on NQ. Let B = {x € Q: f(x) > 0}, and define 
v(A)=pu(AN B) forall A€ F. Then 


[tana | fav. 


Moreover, if f fdu < œ, then (Q,F,v) is a o-finite measure space. 


Problem 1. Prove the preceding proposition. Hint: For j = 1,2,..., let B; = {x € 
Q:1/(9 — 1) > f(x) > 1/7} and define v;(A) = p(AN B;). 


104 8. INTEGRATION THEORY 


The following result generalizes Theorem 9 of Chapter 4 to the Lebesgue 
integration setting. Each part is proved either exactly as in the proof of the 
corresponding part of Theorem 9 of Chapter 4, or by using Proposition 3 and 
Proposition 4 in a straightforward way to generalize the corresponding part of 
Theorem 9 of Chapter 4 to the Lebesgue integral. 

In the statement of this theorem, we use the term ‘y-a.e.’ (‘p-almost every- 
where’). This term, which is analogous to ‘a.s.’, means that the statement to 
which it is attached is true except on a set of u-measure 0. Often, when the 
measure u is understood from the context, we use ‘a.e.’ in the place of ‘p-a.e.’, 
and ‘almost everywhere’ in the place of ‘u-almost everywhere’. 


Theorem 5. Let f and g be R-valued functions defined on a measure space 
(Q,F,p), and let a, b, and c be real constants. 


(i) f(af +bg)du = af fdu+bfgdp, provided the expression on the 
right is meaningful. 

(ii) If f =c a.e., then f f du = cu(9), where as usual, 0-00 is understood 
to be 0. 

(iii) If f = g a.e., then either the integrals of f and g with respect to u 
both exist and are equal, or neither exists. 

(iv) If f < gae. and either f fdu exists and is different from —co 
or f gdu exists and is different from œ, then both integrals exist and 
Sfdus< fodp. 

(v) If f fdu = f gdu is finite and f < g a.e., then f =g a.e. 

(vi) If f f du exists, then | f f du| < f |f| du. 

(vii) If [f f du does not exist, then f |f|dp = œ. 

(viii) f\f+oldu< flfldut flgldu. 


The following notation for the integral of the product of a measurable function 
f and a measurable indicator function Ig is useful: 


[ta | tedn 


One speaks of “integrating the function f with respect to pu over the set B”. The 
corresponding notation in the context of expectations is 


E(X ; B) © E(XIp). 


Problem 2. Show that if f f dp exists, then f p J dp exists for all measurable sets 


B. 


Problem 3. Given a measure space (Q, F, p) and a measurable set B € F, define 
pp(A) = u(AN B) for all A E€ F. Show that 


[ sau= | faus 
B 


in the sense that if one side exists, then both sides exist and are equal. 


8.2. CONVERGENCE THEOREMS 105 


8.2. Convergence theorems 


We start by generalizing the Monotone Convergence Theorem to the Lebesgue 
integral. 


Theorem 6. [Monotone Convergence] Let 0 < fi < fe <... be R -valued 
measurable functions defined on a common measure space (N, F, u), and let f = 


limMn—soo fn. Then 
[fay = im f fan. 


PROOF. By Theorem 5, 


[ras fhanas [tay 


Thus, if f fan du = oo for any positive integer n, we are done. 

Suppose, then, that f fn du < œ for all n. Let B and v be as in the statement 
of Proposition 4, let Bn = {x € Q: fr(x) > 0}, and define v,(A) = p(AN Bn) 
for A € F. By Proposition 4, each of the measures vn is o-finite. Let Bo = Ø 
and define ),(A) = vn(A N Bn A BS_,) for n = 1,2,.... It is easily checked 
that (%1,2,...) is a pairwise mutually singular sequence of o-finite measures. 
By countable additivity, 


v(A) = ` Vn (A) ; 


so v is o-finite by Problem 19 of Chapter 6. By Proposition 4, f f dv = f f dp, 
and for similar reasons, f fn dv = f fn dp for all n. Thus, it is enough to prove 
the theorem with p replaced by the o-finite measure v. 

Since v is a countable sum of finite measures, it follows from Proposition 3 
that there are probability measures (P; : j = 1,2,...) on (Q, F) and nonnegative 
constants cj, J = 1,2,..., such that 


[w= >= Ep, (f). 


j=l 


The same formula holds with f replaced by fn. By first applying the Mono- 
tone Convergence Theorem for expectations, and then applying the Monotone 
Convergence Theorem for sums (Corollary 9 of Chapter 6), we have 


II 


`> cjEp,(f) 


j=1 


S (tim, es, (fa) 


fim 2 cjEp; (fn) = lim J fradv. O 
J= 


Of course, Corollary 12 of the Monotone Convergence Theorem of Chapter 4 
generalizes as well: 


106 8. INTEGRATION THEORY 


Corollary 7. Jf (fi, fo,...) is a sequence of R` -valued measurable functions 
defined on a measure space (Q,F, u), then 


(Er) m= > f fnan: 


The following exercise is an application of the Monotone Convergence The- 
orem to obtain a limited extension of the Continuity of Measure Theorem to 
infinite measure spaces. See also Problem 5 and Problem 9 for further devel- 
opments in this direction. (We have already seen in Chapter 6 that the full 
Continuity of Measure Theorem does not generalize to infinite measure spaces.) 


Problem 4. [Monotone Continuity of Measure Theorem] Let (An: n = 1,2,...) 
be an increasing sequence of measurable sets in a measure space (Q, F, u). Show 
that 


(An) Z BA) as n 7 OO; 


where A = limy-soo An. 


The following useful result is an easy consequence of the Monotone Conver- 
gence Theorem. 


Lemma 8. [Fatou] Let (fı, fo,...) be a sequence of R -valued measurable 
functions defined on a measure space (Q, F, u). Then 


fom inf fn) du < lim int | fr dh. 
NCO n= oo 


PRooF. For m = 1,2,..., let gm = inf{fn: n > m}. Then for each m, 
Jm < fm, and 
Im Z liminf f, as m Žo. 
n> o0 


By the Monotone Convergence Theorem, 


[imine fa) du = lim [om du = liminf | om du < liminf | fr du. O 
n= m— o0 Moo m— o0 


Problem 5. Let (An: n = 1,2,...) be a sequence of measurable sets in a measure 
space (Q, F, u). Show that 


liminf y(An) > p(liminf An). 
n — OO n= o 


Also show that the following inequality does not hold in the general measure space 
setting: 
lim sup (An) < p(limsup An). 


TL CO n= 00 


(Compare Problem 9 of Chapter 6.) 


8.2. CONVERGENCE THEOREMS 107 


Problem 6. For the coin-flip probability space, Example 4 of Chapter 1, let Xn 
denote the indicator function of the event that the n*® flip is heads. Calculate 
lim inf E(Xn) and E(liminf Xn). 


Problem 7. Let (Xi, X2,...) be a sequence of random variables that converges 
almost surely to a random variable X. Show that if sup, EX? < œ, then EX? < 
OO. 


The purpose of the next convergence theorem is similar to that of the Mono- 
tone Convergence Theorem. The hypothesis is not that the sequence (fj, fo,...) 
be monotone but instead that it be dominated by a measurable function g having 
finite integral. The role of this ‘dominating function’ g should be compared with 
the role of X in Corollary 13 of Chapter 4. 


Theorem 9. [Dominated Convergence] Let fı, fo, ... be R-valued measur- 
able functions defined on a measure space (Q, F, u), and suppose that g is a 
nonnegative measurable function defined on (NQ, F, u) such that, for each n, 
|fn| <g a.e.. If f gdu < œ, then 


-0 < fm inf fn) du < liminf | fn du 
n> 00 n- oQ 
(8.2) 
< lim sup | fa du < [cimsup fn)du < œ. 
nm o0 


n00 


If, in addition, f = limn+oo fn exists almost everywhere, then 


[flan < œ, im | tadu= | fan, and dim, f (Uf - fal) du = 0. 


* Problem 8. Prove the preceding theorem. Hint: For the first part, apply the Fatou 
Lemma to the functions g — fn and g + fn. 


Problem 9. [Dominated Continuity of Measure Theorem] Let (An: n = 1,2,...) 
be a sequence of measurable sets in a measure space (Q, F, u). Show that if A = 
lim, An exists and if 


u(U An) <, 
n=1 


then 
H(A) = lim (Anr). 
nooo 


Our remaining convergence results apply only to finite measure spaces. In 
order to emphasize this restriction, we state them in the context of probability 
spaces. 


108 8. INTEGRATION THEORY 


Theorem 10. [Bounded Convergence] Let (X1, X2,...) be a sequence of R- 
valued random variables on a probability space (0,7,P). Assume that X = 
limn—+co Xn exists almost surely. Suppose that there exists a finite constant M 
such that for all n > 1, |Xn| < M a.s.. Then 


E(X) <M, lim E(X,)=E(X), and lim E(\X - X,|) =0. 
n CO TL OO 


Problem 10. Prove the preceding theorem. Also provide an appropriate counterex- 
ample to show that it is false in any infinite measure space. 


Problem 11. Let Y be an R-valued random variable for which E(|Y]) < oo. For 
c > 1, let I. be the indicator function of the event {w: |Y (w)| > c}. Prove that 


lim E(YI-) = lim E(|Y |I) =0. 
coo 


coo 


For our last convergence result we need a definition. It is motivated in part 
by the preceding problem. 


Definition 11. For R-valued random variables X;, t € T, defined on a prob- 
ability space (Q, F, P), (Xt: t € T) is uniformly integrable if 


coe t 


lim sup E(|Xi Htc) = 0, 
ET 


where, for each t € T and c > 0, e is the indicator function of the set {w € 
Q: |Xt(w)| > e}. 


Theorem 12. (Uniform Integrability Criterion] Let (X1, X2,...) be a se- 
quence of R-valued random variables on a probability space (N, F, P). Assume 
that X = limn Xn exists almost surely, and that E(|Xn|) < œ for all n. 
Then the following three statements are equivalent: 
(i) (Xn: n=1,2,...) is uniformly integrable; 
(ii) E(|X|) < œ and limp. E(|Xn — X|) = 0; 
(iii) limp +o E(\Xn|) = E(|X|) < co. 

Each of these three conditions implies 
(iv) Mmr» E(Xn) = E(X). 


PROOF. By using the basic properties of expectations found in Theorem 9 of 
Chapter 4, we obtain 


[EUX] — |X| < E(\(Xal - 1XD]) < E(Xn - XI) 


and 
|E(Xn — X)| < E(Xn - XI). 


Thus, (ii) implies (iii) and (iv). 


8.2. CONVERGENCE THEOREMS 109 


Next we suppose that (i) holds and prove (ii). For c € (0,00) and n = 1,2,..., 
set 


Dine = Gy Ge we Xn and, Zea Xx oes (serous) 


Using uniform integrability, we choose c so that E(|Zn,c|) < $ for all n. Since 
|Ze| < liminfn-so0 |Zn,c|, we obtain from the Fatou Lemma that E(|Z,|) < §. 
Hence, 

E(|X|) = EZ) + E([X| -|Z <5+e< œ, 


proving the first part of (ii). Also, 


E(|\Xn — X|) < E(\Znel) + E(|Zel) + E(I(Xn — Zn,e) — (X — Ze)l) 
< 2+ E(\(Xn — Zne) — (X - Ze)]). 
The Bounded Convergence Theorem implies that the last expectation on the 
right is less than 3 for all sufficiently large n. The rest of (ii) follows. 
It remains to prove that (iii) implies (i). For this part of the proof, there is 
no loss of generality in assuming that each Xn is R*-valued. This assumption 


allows us to use (iv) as well as (iii). 
For c € (0,00) and n = 1,2,..., set 


Xai = Xn : (ees) o Xn) and Y. =X. (eas (e) X) : 
Clearly, Xn — Yn,- > 0 for each n and c. By the Fatou Lemma, 
liminf E(Xn — Yn,-) > E(liminf (Xn — Yn,c)) = E(X — lim sup Yne). 
n- CO noo 


n—> o0 


Condition (iv), linearity of expectation, and the preceding inequality yield 


E(X) — limsup E(Yn,c) = lim E(X,) — lim sup E (Yne) 
n—> o0 n> 


n> o0 
= liminf [E(Xn) - E(¥n,c)] 
= lim inf E(Xn => Ya) 
noo 
> E(X) — E(lim sup Yj.) , 
n— oo 


from which it follows that 
(8.3) lim sup E (Yn, c) < E (lim sup Yn c) < E(Y.). 
n= o0 nN CO 


Let € > 0. Choose co so that E(Y.,) < $ (see Problem 11). By (8.3) we 
may choose m so that E(Yn,co) < E(Y.,) + § for n > m. Then choose cp, 
1 <n < m, so that E(Yn.,) < € for 1 < n < m (again, see Problem 11). Set 


c* = max{cn: 0 < n < m}. Then E(Y,,-+) < € for all n. Therefore, (i) holds. O 


The following problem gives a useful condition for checking uniform integra- 
bility. 


110 8. INTEGRATION THEORY 


* Problem 12. Let X;, t € T, be R-valued random variables on a probability space 
(Q, F, P), and suppose that there exists p > 1 and k < œœ such that E(|X¢|?) < k 
for allt € T. Prove that (X:+: t € T) is uniformly integrable. 


Problem 13. Let (X1, X2,...) be a sequence of random variables, each with finite 
mean. Assume that limnsoo Xn = 0 almost surely and that sup, Var(Xn) < co. 
Show that limnoo E(|Xn|) = 0. Hint: First use the Chebyshev Inequality to show 
that sup, |E(Xn)| < co. 


8.3. Probability measures and infinite measures compared 


In this section, we wish to summarize the most important similarities and dif- 
ferences between probability spaces and infinite measure spaces. 

First, let us compare expectations and Lebesgue integrals. We have seen that 
the Monotone and Dominated Convergence Theorems and the Fatou Lemma 
are valid for both. We also obtained, in Theorem 5, generalizations of the eight 
properties from Theorem 9 of Chapter 4, with only property (ii) requiring any 
significant modification. It turns out that the Cauchy-Schwarz Inequality also 
holds in general; the proof in Chapter 4 applies to the general setting without 
change. 

On the other hand, the Bounded Convergence Theorem and the Uniform 
Integrability Criterion fail in general. If the measure is infinite, boundedness 
of a function does not even ensure that its integral exists, and, as defined in 
Definition 11, ‘uniform integrability’ is useless. 

The Jensen Inequality also fails in the infinite setting. For a simple counterex- 
ample, let u be Lebesgue measure on the interval [1, oo), and let f(x) = 1/z and 


p(x) = 2”. Then 
o( f Fan) = and | pofdu=1. 


Problem 14. What is the appropriate statement of the Jensen Inequality for a 
finite measure space? 


Turning from results about integration with respect to infinite measures to 
results about infinite measures themselves, we note first that the Continuity of 
Measure Theorem does not generalize completely, but that restricted versions of 
it do (see Problem 4, Problem 5, and Problem 9). The Borel Lemma and its proof 
generalize without significant change (but this is not true of the Borel-Cantelli 
Lemma—see below). A great deal of the existence and uniqueness theory from 
Chapter 7 also carries over, but we do not concern ourselves with the details in 
this book, since Problem 19 of Chapter 6 provides us with a sufficiently powerful 
tool for constructing infinite measures from finite ones. 


8.4. LEBESGUE INTEGRALS AND RIEMANN-STIELTJES INTEGRALS 111 


Finally, we note several concepts and results from probability theory that are 
either meaningless, useless, or obviously wrong in the infinite setting: variance, 
covariance, correlation, the Chebyshev and Markov Inequalities, and the Law of 
Large Numbers. The following example also shows that any attempt to generalize 
the Kochen-Stone and Borel-Cantelli Lemmas is doomed to failure: Let u be 
Lebesgue measure on (R, B), and let An = (n,n + 1). 


8.4. Lebesgue integrals and Riemann-Stieltjes integrals 


A measure pz on a Borel o-field is called a Radon measure if (C) < oo for every 
compact set C. In this section we consider Radon measures on the measurable 
space (R, B). 


Problem 15. Prove that a Radon measure p on (R, B) is necessarily o-finite. 


Definition 13. Let F be a function defined on R, and let u be a Radon 
measure on R. We say that F is a distribution function for yp if 


u((a, b]) = F(b) — F(a) 
for all real a < b. 


It is clear that two distribution functions correspond to the same Radon mea- 
sure if and only if their difference is a constant. By methods used in the proofs 
of Proposition 3 and Proposition 4, both of Chapter 3, it can be shown that 
every distribution function is increasing and right-continuous, and that to a 
function F on R with these properties there corresponds a unique Radon mea- 
sure u such that F is a distribution function of u. Given a Radon measure p on 
(R, B), a corresponding distribution function F may be constructed by defining 
F(x) = p({0,2]) for z > 0, and F(x) = —p((z,0)) for x < 0. For Lebesgue 
measure on R, this construction gives us the distribution function z ~ z. 

Often, integrals with respect to can be calculated as Riemann-Stieltjes inte- 
grals with respect to F. The following statement, a generalization of Theorem 17 
of Chapter 4, is a straightforward consequence of that theorem. 


Theorem 14. Let u be a Radon measure on (R, B) with distribution function 
F, and let y be an R-valued function that is Riemann-Stieltjes integrable with 
respect to F on every bounded interval. If f ydp exists, then 


[ous f oe)arq). 


— 00 


Remark 1. The two integrals in the preceding theorem are both limits of 
sums involving functions that approximate y. The approximating functions for 
the integral on the left are simple functions, while the ones for the integral on the 
right are step functions. Using simple functions to approximate y amounts to 


112 8. INTEGRATION THEORY 


partitioning the target space of p, while using step functions involves partitioning 
the domain. 


Problem 16. Let be a Radon measure on (R, B) with distribution function F, and 
let y be a monotone function with a continuous derivative defined on an interval 
[a,b] contained in R. Find a formula for 


l pdu 
(a,b] 


which involves only a Riemann integral and the values of y and F at a and b. Hint: 
If the domain of y is extended to (—0o, 00) by defining p(x) = 0 for x ¢ (a,b), y 
is Riemann-Stieltjes integrable with respect to every distribution function F. 


Problem 17. What changes are necessary in the formula obtained in the preceding 
exercise if (a,b] is replaced by: (i) [a,b], (ii) [a,b), and (iii) (a,b)? Prove the 
assertions you make. 


Problem 18. Find an example of a bounded function whose improper Riemann 
integral on (—co,0o) exists but whose integral with respect to one-dimensional 
Lebesgue measure does not exist. Also find a function whose integral with respect 
to Lebesgue measure exists, but which is not Riemann integrable on some bounded 
interval. Hint: For the first part, look at (Counter)example 4 of Chapter 4. 


In view of the Theorem 14, the Monotone and Dominated Convergence The- 
orems are available for the study of Riemann integrals. However, one must be 
cautious in using these theorems for the general theory, because the limit of a 
sequence of Riemann integrable functions may be a function without a Riemann 
integral, even if it is bounded by a constant and is equal to 0 outside some closed 
interval. The Dominated Convergence Theorem plays a role in the following 
sequence of exercises, designed to yield an asymptotic formula for the gamma 
function. 


Problem 19. Use the substitution u = y + v/f, suggested by the fact that the 
mean and standard deviation of a gamma distribution are y and ,/7, respectively, 
to obtain the following formula for the gamma function I: 


7 = = er ee 
roe (Qa fae ay te? ae. 
-7 


Problem 20. Use the preceding exercise to show that 


MD =f etiw f 9a (v) dv 


y= (1/2) eny = 


where 
(v) a if —y/? <u <0 
Py w) = 


0 otherwise 


8.4. LEBESGUE INTEGRALS AND RIEMANN-STIELTJES INTEGRALS 113 


and 


6,(v) = 


(1+ uy—}/?)7-} e2? ify > 0 
otherwise . 


Problem 21. Prove that the function 


z = log(l +a) = Fore , 


x>-l, 
is decreasing. 


* Problem 22. Use the Dominated Convergence Theorem and Theorem 14 to prove 


that 
im f sodo = | en do, 
y> OO SoS 0 


with 9, as defined in Problem 20. Hint: Use the preceding problem at an appro- 
priate point. 


Problem 23. Use Theorem 14 to prove that 


oe 0 
lim / py(v) dv = f eW? /? dy ’ 
yoo = ips 


with y, as defined in Problem 20. Hint: For y > 2, the Taylor Formula can be 
used to show that the Dominated Convergence Theorem is applicable. 


The next result gives an asymptotic formula for the gamma function. We write 
‘f(y) ~ gly) as y > oo’ to mean that f(y)/g(y) > lasy — oo. Table 8.1 shows 
this formula gives a surprisingly good approximation to the gamma function, 
even for small values of the argument. 


Theorem 15. [Stirling Formula] The gamma function T satisfies 
L(y) ~ Vr VO) eT] as y> oœ. 


In particular, 
nin V2rnn"e” asnro. 


Problem 24. Prove the preceding theorem. Hint: Use Problem 20, Problem 22, 
and Problem 23. 


The following sequence of problems gives further practice in using the conver- 
gence theorems in combination with other techniques. In the process it will be 
shown that I’(1) is the negative of ‘Euler’s constant’. 


114 8. INTEGRATION THEORY 
Repo 
P= pe 
Vimy! 3e77 1.520 359869.5 


—0.142 | —0.053 | —0.026 | —0.021 | —0.014 | —0.008 


TABLE 8.1. Gamma function, Stirling Formula, and relative error 


n 
Nie 


I 


Problem 25. Let I denote the gamma function. Prove that 


TETEN -o E i ete o 
ra= f (log x) e a= | (log x) x" e a Dae 


for each positive integer n. Hint: Use mathematical induction and integration by 
parts. 


* Problem 26. Prove that 


n+ Yn? 
lim = J (log Zg” é drs 


n= 00 n! 


Problem 27. Prove that 
1 of" Ynez 
lim ~= (log Zg” e” dr=0. 
noo n: 0 n 


Hint: Show that integrand is negative and decreasing. 


Problem 28. Prove that 


Problem 29. Use the preceding four exercises to prove that 
J (log rje “dr = —C , 
0 


where C = 0.577 is Euler’s constant defined by 


(oi 
C= tim ($ Elen) 


k=1 


8.5. ABSOLUTE CONTINUITY AND DENSITIES 115 


Decide whether your proof establishes the existence of this last limit. If not, supply 
appropriate additional arguments. 


8.5. Absolute continuity and densities 


Riemann-Stieltjes integration is not always easy, so it is desirable to supplement 
Theorem 14 with further computational techniques. For this purpose, we use 
the density concept, which was introduced briefly in Chapter 3. 


Definition 16. Let (Q, F) be a measurable space, and let u and v be two 
measures on (Q, F). We say that v is absolutely continuous with respect to p, 
written v < u, if v(A) = 0 for every A € F for which (A) = 0. 


Proposition 17. Let (Q, F, 4) be a measure space and let f be an Rt -valued 
measurable function. For A € F define 


v(A) = | fdn. 


Then v is a o-finite measure on (Q, F) satisfying v & u. 


PROOF. Clearly v is nonnegative. Countable additivity follows from the 
Monotone Convergence Theorem, as shown by the following computation: 


By) = JI e = f Ip, d 

(Qe) [nnn [Ems 
n > | te, du = Y` v(Bn), 
nal n=l 


where we have assumed that the sets Bn are pairwise disjoint. Since fl4 = 0 
u-a.e. if (A) = 0, the fact that v is absolutely continuous with respect to pu 
follows from property (ii) of Theorem 5. 

We will show that v is o-finite under the assumption that p is finite. The 
extension to the case in which p is o-finite is straightforward. For each n = 
1,2,..., let Bn = {z:n— 1< f(x) <n}. For A E€ F, define 


val) = | (fo,) a ene 


The argument in the first paragraph of this proof shows that vn» is a measure for 
each n. Since yz is assumed to be finite, and since f is bounded on each of the 
sets Bn, it is clear that each vn is finite. The Monotone Convergence Theorem 
implies that v(A) = >), u(A), so v is o-finite, as desired. O 


The preceding proposition has a converse which is important in measure the- 
ory but which plays a minor role in this book. For completeness, we state this 
converse here. ‘The proof appears in Chapter 23 as an application of ‘conditional 
expectation’. 


116 8. INTEGRATION THEORY 


Theorem 18. [Radon-Nikodym] Let u and v be o-finite measures defined on 
a common measurable space and satisfying v & u. Then there exists an Rt - 
valued measurable function f such that 


v(A) = f du 
( ) | 
for all AE F. 


The function f in the preceding theorem is called the density or the Radon- 
Nikodym derivative of v with respect to u. We write 


dv 
ane 


Problem 30. Justify the use of the word ‘the’ in the preceding statement by show- 
ing that if f and g are both densities of v with respect to p, then f = g p-a.e. 


If v has a density with respect to u, we can use this density to compute the 
Lebesgue integral of a function with respect to v in terms of an integral with 
respect to p. 


Proposition 19. Let u and v denote o-finite measures defined on a common 
measurable space, and suppose that dv/du exists. Then, for every measurable 
R-valued function g defined on (0, F), 


fow= [oF dn, 
dp 


where the product g(x)(dv/du)(x) is understood to equal 0 if either factor equals 
0, even if the other factor equals co. The assertion is that if either side exists, 
then both sides exist and are equal. 


Problem 31. Prove the preceding proposition. Hint: First consider simple func- 
tions g. 


Proposition 20. [Chain Rule] Let A, p, and v be o-finite measures defined 
on a common measurable space. Suppose that dv/du and du/dX exist. Then 


dv dv dp 
D = du dA À- a.€., 


part of this conclusion being that dv/dX exists. 


Problem 32. Prove the preceding proposition. 


8.5. ABSOLUTE CONTINUITY AND DENSITIES 117 


Problem 33. [Reciprocal Rule] Let u,v be o-finite measures for which v has a 
density y with respect to u. Prove that if y is y-a.e. nonzero, then 1/p (defined 
to be some arbitrary constant for x such that y(x) = 0) is the density of u with 
respect to vV. 


Proposition 19 is especially useful for calculations when p is Lebesgue measure 
on R. We have already seen this in the applications of Definition 9 of Chap- 
ter 3. The word ‘density’ there is consistent with our present usage. However, 
even in the special case of a probability space, our current definition is a strict 
generalization of Definition 9 of Chapter 3, since we now allow densities which 
are Lebesgue integrable even if they are not Riemann integrable. 


Problem 34. Let X be a random variable with the gamma distribution, with pa- 
rameters a,y > 0. In Example 3 of Chapter 4, we showed that E(X) = y/a. In 
the calculation, some effort was needed to accommodate y < 1. Use aspects of this 
section to describe another way of accommodating y < 1. 


* Problem 35. Let u be a Radon measure on (R, B) with distribution function F. 
Prove that if F’ = f exists and is Riemann integrable on every bounded interval, 
then p is absolutely continuous with respect to Lebesgue measure on R, and the 
density of pis f. (The result is still true without the hypothesis that f be Riemann 
integrable on bounded intervals, but the proof is considerably more difficult.) 


In the preceding section of this chapter, we studied the gamma function by 
a combination of Riemann integration theory and Lebesgue integration theory. 
In particular we used a change of variables in a Riemann integral. The next 
proposition, which does not require Riemann integrability, shows that we could 
have made the change of variables in the Lebesgue setting instead. 


Proposition 21. [Change of Variables] Let y be a strictly increasing differ- 
entiable function from an interval J onto an interval K, and let à be Lebesgue 
measure on R. For f a measurable function from K to R, 


| far= [pew an 


in the sense that if one side exists, then both exist and are equal. 


/ 


is the density 
on J. 


Problem 36. Prove the preceding proposition. Hint: Show that ọ 
with respect to Lebesgue measure of the measure induced by 7t 


Problem 37. Let f be an hae measurable function defined on (R, B). Let 
g(x) = f(-x), and h(x) = f(x +c), where c is a fixed constant. Show that 


a fra, 


where À is Lebesgue measure on R. 


118 8. INTEGRATION THEORY 


Problem 38. Show that the Cantor distribution is not absolutely continuous with 
respect to Lebesgue measure on R. 


8.6. Integration with respect to counting measure 


Lebesgue integration reduces to summation when the underlying measure is 
counting measure. 


Proposition 22. Let p be counting measure on a measurable space (Q, F), 
where F consists of all subsets of N, and let f be an R-valued function defined 
on Q. Define 


S+ = {9 f*(2): A is a finite subset of Q}. 
TEA 
Then 
[ta = supSt — sup ST, 
in the sense that if either side exists, then so does the other and they are equal. 
In addition, if Q ts countable and ffdp exists, then 


[fu=Ss@. 
LEQ 


The order of the terms in this summation does not affect the value of the sum. 


Problem 39. Prove the preceding result. Hint: See Problem 18 of Chapter 6. 


Problem 40. Let u be counting measure on a countable space 2. Suppose that f 
is R-valued and f f du does not exist. Investigate the effect that the order of terms 


in the sum 
X f(z) 


TEQ 
has on its existence and possible value. 


Problem 41. Let yu be counting measure on a space 2. Show that if f f dy is finite, 
then {x: f(x) Æ 0} is a countable set. Hint: Use Proposition 4. 


Problem 42. State the Dominated Convergence Theorem, the Fatou Lemma, and 
the Cauchy-Schwarz Inequality for sums. 


Problem 43. On Z, let u be counting measure and v an arbitrary o-finite measure. 
Find a formula in terms of v and p for the density g of v with respect to u. For 
functions f: Z > R, find a summation formula for f f dv in terms of f and g. 


Problem 44. Interpret certain quantities found in Table 5.1 of Chapter 5 as den- 
sities with respect to counting measure on Z. 


PART 2 


Independence and Sums 


120 PART 2. INDEPENDENCE AND SUMS 


In this part of the book, we introduce a very important concept: ‘stochastic 
independence’. Roughly speaking, two random experiments are independent if 
knowledge of the outcome of one of the experiments does not affect one’s assess- 
ment about the distribution of outcomes in the other experiment. This notion 
permeates probability theory, and in so doing distinguishes probability theory 
from general measure theory. Even random objects that are not independent are 
often analyzed in terms of related structures that are independent. An attrac- 
tive feature of probability theory lies in its ability to model such an important 
heuristic notion with mathematical precision. 

The mathematical definition of independence involves ‘product measure’, to 
be defined in Chapter 9. Calculations involving product measure are facilitated 
by the use of a famous theorem from integration theory, the Fubini Theorem. 
Other basic definitions and results and several examples are also found in Chap- 
ter 9. 

Many applications of probability theory involve sums of independent random 
variables, and these form the subject matter of most of the rest of Part 2. Gen- 
eral definitions and examples are found in Chapter 10. Random walks, which 
involve sums of identically distributed independent random variables, are stud- 
ied in Chapter 11. Chapter 12 contains several convergence results related to 
sequences and sums of independent random variables, including the Strong Law 
of Large Numbers. Chapter 13 introduces an important tool known as the ‘char- 
acteristic function’. This tool is useful in analyzing the distributions of sums of 
independent random variables. Some interesting applications of characteristic 
functions will be given in Chapter 13, but their real power will be revealed in 
Part 3. 


It will sometimes be convenient to have an alternate notation for integrals, 
one which makes explicit the variable of integration. The new notation for f f dy 
is 


[ #@ naz). 


This notation will be particularly useful when we work with iterated integrals. 
See, for example, the statement of the Fubini Theorem in Chapter 9. 


CHAPTER 9 
Stochastic Independence 


The first six sections of this chapter describe the measure-theoretic foundation 
for ‘stochastic independence’: products of probability spaces. After giving the 
basic definitions, we prove the existence of ‘product measure’ and also give an 
important result concerning integration with respect to product measure (the 
Fubini Theorem). Important relations among expectations, independence, and 
densities are described. The last three sections of the chapter do not depend on 
each other. The first treats the asymptotic behavior of sequences of independent 
identically distributed random variables. The second concerns ‘order statistics’ 
of finite sequences of such random variables. The last introduces some new 
distributions. 


9.1. Definition and basic properties 
We begin with a general example intended to establish the basic ideas behind 


our definitions. 


Example 1. Consider two experiments, represented by probability spaces 
(01, Fi, Pi) and (Q2, F2, P2). The product space Q = Qı x Qz is a natural 
sample space to use for a compound experiment in which both are performed. 
The coordinate maps X;: Q > Q;, defined by 


Xj(W1,wW2) = wi, t= 2 


link the compound experiment to the two original experiments. 
For A; E F, let 


Qı (4ı x Q2) = P (A) ; 


This defines a probability measure on the measurable space (Q, G1), where 


Gi = {A x Qe: Ay E Fi}. 


122 9. STOCHASTIC INDEPENDENCE 


The probability spaces (Q, G1, Qı) and (1, 7, P,) are equivalent from a theo- 
retical point of view, and hence either one can be used to model the first exper- 
iment. In analogous fashion, we can define a probability space (Q, G2, Q2), with 
Go = {Q x Ao: Ao € Fa} and Q (Qı x A2) = Pə(A2), which is equivalent to 
the probability space originally representing the second experiment. 
Let 
F =0(G1,G2), 


the smallest o-field containing both G, and Ga. We are interested in defining 
a probability measure P on F that agrees with Qı on G, and with Qe on Go. 
There are typically many such measures P, each of which models a compound 
experiment involving the two original experiments. The choice of P reflects the 
relationship between the two original experiments. 

In this chapter we will construct a particular P that models the situation in 
which the original two experiments have no influence on each other. Let R be 
the collection of all ‘measurable rectangles’, that is, 


R= {A x Ag: Ai € Fy and Ag € Fo}. 


Note that 
a(R) = F =a(G1, G2), 
since every measurable rectangle A, x Ag is the intersection of the set A; x Qə E 
G, and the set Q x Ao E Go. 
We define P for sets in R by 


P(A, x Ao) = Pi (A1)Po(A2). 
Note that this definition implies that 
(9.1) P(B, A Bo) = P(B,)P(B2) for all By € Gi, B2 E€ G2, 
since we may write Bı = A, x Qa and Bə = QQ; x Ag and calculate as follows: 
P(B,N B2) = P(A, x A2) = P (41)Po (A2) = P(A1 x N2)P(Q1 x A2). 


The relationship between G, and G2 that is expressed in (9.1) is known as 
‘stochastic independence’. This relationship has the interpretation that in the 
combined experiment, the two original experiments do not affect each other. 

Of course, we have not yet completed the process of combining the two original 
experiments in this example, because we have not defined P on all of F. We 
will prove in Theorem 7 that P can be extended to a probability measure on 
F. Since R is closed under pairwise intersections, the Uniqueness of Measure 
Theorem implies that this extension is unique. 


* Problem 1. Match the situation described in Problem 10 of Chapter 1 with the 
preceding example. For the situation in that problem decide how many members 
each of the following sets has: Qi, Fi, Gi, Q, F, and R. 


9.1. DEFINITION AND BASIC PROPERTIES 123 


We are now ready to give a formal definition of stochastic independence. 
Notice that when this definition is applied to the pair G,,G2 in Example 1, it is 
equivalent to (9.1). 


Definition 1. Let (Q, F, P) denote a probability space and let Fk, k E K, 
be sub-o-fields of F. 


(i) If K is finite, (Fp: k € K) is stochastically independent (or indepen- 
dent when there is no danger of confusion) if 


(9.2) P( N An) = || PCAs) 


kek kek 


for all Ak E Fr. 
(ii) If K is infinite, (Fp: k € K) is stochastically independent if for all 
finite sets J C K, (F;: j € J) is stochastically independent. 


We also want to define independence for random variables and events. To do 
so we speak of the o-field generated by a (W,G)-valued random variable X: 


o(X) = {X7(B): Be G}. 


Note that o(X) is the smallest o-field with respect to which X is measurable. 
The o-field generated by an event A in a probability space (Q, F, P) is the o-field 
generated by the indicator function of A; it equals {@, A, A°, Q} and is denoted 
by o(A). Independence for random variables and events is defined in terms of 
independence of the o-fields that they generate. 


Definition 2. Let (Q, F, P) be a probability space, and let 
M = (Ej, Xz, Ar: 3 € J, k EK, le L) 


consist of collections E£; C F of events, random variables X, defined on (Q, F, P), 
and events A; € F. Then M is said to be stochastically independent (or inde- 
pendent when there is no danger of confusion) if 


(a(E;), o(Xz), 0( A): JE J, KE KIEL) 
is stochastically independent. 


It is common to speak of two events A and B as being independent, or to say 
that A is independent of B, even though one really means that the pair (A, B) 
is independent; stochastic independence is not a property of the two events 
individually, but instead of the relationship between them. Such language is also 
often used for random variables and o-fields. (A similar lack of precision occurs 
when the term ‘linear independence’ is used in linear algebra.) 


9. STOCHASTIC INDEPENDENCE 


Problem 2. For k = 1,2,..., let X, be a (Y,,G;,)-valued random variable, and 
let pk be a measurable function from (Wk, Gk) to (Ox, Hk). Prove that if the 
sequence (X1, X2,...) is independent, then the sequence (y1 o X1, y2 0 Xe,...) is 
independent. Hint: Show that o(y; o X;) C o(X;) for each i. 


Problem 3. Show that a pair of events (A, B) is independent if and only if P(AN 
B) = P(A)P(B). Find an example of a triple (A,B,C) of events that is not 
independent but still satisfies P(A N BOC) = P(A)P(B)P(C). 


Problem 4. Let (Ai, Az,...) be an independent sequence of events. Prove that 


P (À an) = JI raan). 


n=1 


Problem 5. Consider three events for the fair coin-flip probability space of Exam- 
ple 4 of Chapter 1: 


Ay = fw: w =1}, Ag={wiwe=1}, A3 = {wi wi +w: =1}. 


Show that any pair of them is independent, but that the triple (A1, A2, A3) is not. 
Describe this phenomenon in an intuitive manner. 


Problem 6. Let (Xi, X2,...) be an independent sequence of R-valued random vari- 
ables and (F3, Fo,...) the corresponding sequence of distribution functions. Let 
Y (w) equal the greatest lower bound and Z(w) the least upper bound of the set 
{Xn(w):n=1,2,...}. Find a formula for the distribution functions of Y and Z 
in terms of the functions Fn. Comment on the situation if the random variables 
Xn are assumed to be R-valued. 


Problem 7. Let (X1,X2) be an independent pair of exponentially distributed ran- 
dom variables with means A; and Ao, respectively. Calculate and name the distri- 
bution of Xı A Xə. 


Problem 8. Let (X1, X2) be an independent pair of random variables, each hav- 
ing the same distribution—beta with parameter GG, 1). Calculate and name the 
distribution of X; V X2. 


Problem 9. Let (X1, X2,...) be an independent sequence of identically distributed 
R-valued random variables. Let 
U(w) =liminf X,(w) and V(w) =limsup X,(w). 
n —> © noo 


Find the distribution of the ordered pair (U, V) in terms of the common distribution 
function F of the Xn. Is the pair (U,V) independent? 


Problem 10. Let (2,7, P) be a probability space, and for each k in some finite 
index set K, let E% be a countable partition of Q. Prove that (Ek: k € K) is 
independent if condition (9.2) holds for all events A, E€ Ex, k E€ K. 


9.1. DEFINITION AND BASIC PROPERTIES 125 


Problem 11. Show that if A is an event, then the pair (A, A) is independent if and 
only if P(A) = 0 or 1. Also show that if G is a o-field, then (G,G) is independent 
if and only if P(A) = 0 or 1 for all AEG. 

Problem 12. Show that if X is arandom variable that takes values in R$, B), then 
(X, X) is independent if and only if X is a constant a.s. Give an example to show 
that this statement fails to be true in general if X is merely assumed to take values 
in some measurable space (W,G). Hint: For the first part, use the previous exercise 
to show that if (X, X) is independent, the distribution function of each component 
of X can only take the values 0 and 1. For the example, take G to be the trivial 
o-field {¥, 0}. 


Problem 13. Let © consist of the 36 ordered pairs w = (wi,w2), where 1 < w; < 6. 
Let F denote the o-field consisting of all 2°° subsets of Q. Set P(A) = {A/36 for 
each A € F. This is the typical sample space used for the experiment of rolling 
two fair dice. Let 


Xilw) =w, Xelw)=we, X3(w)=(-1), Xa(w) =5, 

A, = {w: wi is divisible by 3}, Az = {w: w1 is divisible by 2}, 

A3 = {w: Xi(w) + Xe(w) = 7}, Ag = {w: X1(w) + X2(w) is odd}, 
E = {{(w1, we), (we,wi)}:w E Q}. 


Consider the sequence 
M = (Xo, X3, X4, X1 + Xa, (X1, X2), Ai, Ao, A3, Aa, Q, E) : 


Which subsequences are independent? Hint: Any further subsequence of an in- 
dependent subsequence is independent, so one only needs to be concerned with 
independent subsequences that are maximal in length. 


The next three propositions are useful for checking independence. The first 
one generalizes Problem 10. 


Proposition 3. Let (02,7,P) denote a probability space, and let M = 
(€;: 1 € L) consist of subcollections E, of F, each one of which is closed un- 
der pairwise intersections. Then M is independent if and only if (9.2) holds for 
all finite sets K C L and events Ay E Ek, kE K. 


* Problem 14. Prove the preceding proposition in the case that L has only 2 mem- 
bers. Hint: Apply the Sierpiński Class Theorem of Chapter 7 twice. (Generalizing 
the proof to treat arbitrary L is straightforward but notationally messy.) 


Problem 15. Use Proposition 3 to obtain a simple criterion that a sequence of 
events be independent. 


126 9. STOCHASTIC INDEPENDENCE 


When we are dealing with a collection or sequence of o-fields, it is useful to 
have some streamlined notation for the smallest o-field containing the collection 
or sequence. We write 


o(Fu:k€K) and o(Fi,Fe,...) 


for 
o U Fr) and «(U Fr) 2 
kek k=1 
Similar notation is used with events, collections of events, and random variables. 
For example, we write o( X,Y) for o(a(X) Ua(Y)). 


Proposition 4. Let K and M denote sets and let {Km,m E€ M} be a parti- 
tion of K. Let (Q, F, P) be a probability space, and let Fg, k E€ K, be sub-o-fields 
of F. Suppose that (Fp: k E€ K) is independent. Then (o (Fp: k € Km): Mm E€ M) 
is independent. 


PROOF. By the definition of independence, it is enough to prove the result 
for finite sets M. For each m € M, let Gm =a(Fp: k € Km) and let Em be the 
collection of all finite intersections of members of 


U Fr. 
kEKm 


Each Em is closed under pairwise intersections and Gm = o (Em). Therefore, by 
Proposition 3, it is enough to check that 


P ( N Am) = [| Pn) 
mEM mEM 
for all events Am E Em,m € M. If Am is an event in Em, then there exist events 
By, € Fp for k € Km such that B, = Q for all but finitely many k € Km and 
A iol Bes 
ke Km 


(Since the sets Km are disjoint, there is no need to put additional subscripts on 
the events B, to distinguish them for different values of m.) Since (Fk, k E€ K) 
is independent and B, = Q for all but finitely many kE K, 


P( N Am) =P N A Bi) oath B, ) = || P(e) 


mEM MEM kEKm kek kek 
= TI If p= P(A %)= I Pm. 
MEM kEKm mEM kEKm mEM 


where we have used (9.2) twice and also have used the fact (or convention) that 
infinitely many (not necessarily countably many) factors equaling 1 in a product 
do not affect the value of that product. O 


9.2. PRODUCT MEASURE: FINITELY MANY FACTORS 127 


Proposition 5. Let (0,7, P) be a probability space, and let Fk, k = 1,2,..., 
be sub-o-fields of F. Then the sequence (Fp: k =1,2,...) is independent if and 
only if each of the pairs (o(F1,...,Fn),Fn41) is independent for n = 1,2,.... 


Problem 16. Prove Proposition 5. 


Problem 17. Let X1,...,Xn be R-valued random variables. Show that the se- 
quence (X1,...,Xn) is independent if and only if 


P({w: Xi(w) <a,...,Xn(w) < an}) = | [ Po: Xs) < aY) 
i=1 


for all a1,...,@n, € R. Also show that if the random variables X; are Z-valued, 
then (X1,...,Xn) is independent if and only if 


P({w: Xi(w) = ki,.--,Xn(w) = kn}) = | | P({w: Xito) = ki} 


for all ki,...,kn E Z. 


Problem 18. Prove that if the triple (X1, X2, X3) of R-valued random variables is 
independent, then so is the pair (X1, X2 + X3). Also find an example for which 
(X1, X2 + X3) is independent, but (X1, X2, X3) is not. 


Example 2. For the coin-flip probability space of Example 1 of Chapter 7, 
let X,(w) = wn and Fn = o(X,). It is straightforward to check that (Fy, Fo,...) 
is an independent sequence of o-fields. Equivalently, the sequence (X1, X2,...) 
is independent, which is often expressed more informally by saying that the 
random variables X, are independent. By Proposition 4 we can conclude such 
things as: the number of heads in the first ten flips is independent of the number 
of heads in the eleventh through eighteenth flips. 


9.2. Product measure: finitely many factors 


Example 1 and Example 2 indicate that product spaces can be used in a nat- 
ural way in the construction of independent random variables. The key to the 
construction is to carry out the extension that was advertised in Example 1. In 
this section we will first do this for the product of a pair of probability spaces 
and then in a problem have the reader extend the construction to the product 
of a finite number of probability spaces. In a subsequent section we will treat 
countably many factors. 

The first definition provides terminology for some of the objects that were 
already introduced in Example 1. 


128 9. STOCHASTIC INDEPENDENCE 


Definition 6. Let (1,71) and (Q2, F2) be measurable spaces, and let Q = 
Q; x N2. Let R be the collection of measurable rectangles in Q: 


R= {A x Ap: Ay € Fy and Áz € Fə}. 


The o-field 
F=a(R), 

denoted by 
Fi x Fa ; 


is called the product o-field of F; and F2. The measurable space (Q, F), denoted 
by 

(Q1, Fi) x (Q2, F2), 
is called the product of (Q1, Fi) and (Q2, F2). 


Theorem 7. Let (Q1, Fi, p1) and (Q2, F2, u2) be o-finite measure spaces. 
There exists a unique measure u on the measurable space (N1, Fi) x (Q2, F2) 
such that 


(9.3) p(Aı x Ag) = pı (A1 )u2(A2) 


for all Ay € Fı and Az € Fo. The measure u is o-finite. Moreover, if pı and 
u2 are probability measures, then so is u. 


PROOF. We give the proof in the case that 1; and pa are finite. The extension 
to the o-finite case is straightforward (see Problem 19 of Chapter 6 and the 
accompanying discussion), as is the proof that p is o-finite. 

Since the collection of measurable rectangles is closed under pairwise inter- 
sections and generates Fı x Fz, the uniqueness follows from the Uniqueness of 
Measure Theorem, even though that theorem treats probability measures rather 
than arbitrary finite measures. The remainder of the proof will be devoted to 
the existence issue. 

Consider the collection of B C Qı x Qə for which the following function is 
defined and measurable: 


Wy -f Ig (wi, w2) p2 (dw2). 
Qe 


Clearly, it contains all measurable rectangles. By linearity of integration, it is 
closed under proper differences; and by the Monotone Convergence Theorem, it 
is closed under increasing limits. Therefore, by the Sierpiński Class Theorem, it 
contains all members of Fı x Fz. Accordingly, we may define 


(9.4) mB) = f (J Tn (ssn) pac) pa (dian). 


for B € Fy x Fo. 
That u is countably additive follows from a corollary of the Monotone Con- 
vergence Theorem. For B of the form A; x A» for some A; E€ Fı and A» E Fa, 


9.2. PRODUCT MEASURE: FINITELY MANY FACTORS 129 


Iglwi,w2) = I4, (w1 Ma (w2). Insertion of this product into (9.4) gives (9.3). In 
particular, u(Q, x Q2) = pi(N1)u2(N2) < oo. Therefore, p is a finite measure, 
and furthermore, it is a probability measure if both pı and u2 are probability 
measures. O 


The measure p defined in the preceding theorem is called the product measure 
of uı and u2. It is denoted by 


Hı X H2. 


The measure space (Q1 x Q2, Fı x F2, 41 X pe) is called the product space of 
(Q1, Fi, u1) and (Q2, Fe, we), and is denoted by 


(Q1, Fi, p1) X (Qe, Fe, p2) - 


In Example 1, we were given probability spaces (0;, Fi, Pi), i = 1,2, and then 
we constructed the measurable space (Q, F) as the product of the spaces (Qi, F;). 
We also defined sub-o-fields G1,G2 of F and a function P on the collection of 
measurable rectangles R. We now see, according to Theorem 7, that the product 
measure P, x Pz is the desired extension of P to F, and that the pair (G1, G2) is 
independent. 


Problem 19. Let X1, X2 be the two random variables defined in Example 1. Show 
that on the probability space described in the preceding paragraph, (X1, X2) is an 
independent pair. 


Problem 20. Let (W;, Fi, pi) be o-finite measure spaces for i = 1, 2,3. Show that 
(Fi x Feo) x F3 = Fi x (Fa x F3). 


and 
(mı X p2) X u3 = pı X (p2 X ps). 


Extend to products of n measure spaces, and use the result to explain and justify 


the notation 
n 


QY: Fi, mi) - 


i=1 


Problem 21. Let Q1, Q2,..., Qa denote the distributions of some random variables 
X1, X2,..., Xa defined on a common probability space. Prove that X;, 1 <i < d, 
are independent random variables if and only if the distribution of the random 
d-tuple (X1, X2,..., Xa) equals Qi x Q2 x- x Qa. 


Problem 22. Let À be Lebesgue measure on (R, B), and let (R*, B*, AŻ) be the 
d-fold product of (R, B, A) with itself. The measure Aĉ is called d-dimensional 
Lebesgue measure. Explain why d-dimensional Lebesgue measure generalizes d- 
dimensional volume, and why integration with respect to d-dimensional Lebesgue 
measure generalizes Riemann integration in R?. 


130 9. STOCHASTIC INDEPENDENCE 


9.3. The Fubini Theorem 


The following result is a preliminary step in obtaining an important tool for 
computing integrals on products of o-finite measure spaces. 


Proposition 8. Let f be a measurable function from a product measurable 
space (W,G) x (©, H) to a measurable space. Then x ~ f(x,y) is measurable for 
each fixed y € O and y ~ f(x,y) is measurable for each fixed x € WV. 


* Problem 23. Prove the preceding proposition. Hint: It is enough to prove that 
x ~> (Ig o f)(x, y) is measurable for fixed y and measurable B. 


Here is a partial converse of Proposition 8; the full converse is not true. 


Proposition 9. Let (O,H) be a measurable space, and let A be a Borel subset 
of R (with the usual topology). Suppose that f: A x O > R is such that: (i) 
xz ~> f(x,y) is continuous for each y € © and (ti) y ~ f(x,y) is measurable for 
each x € A. Then f is measurable. 


PROOF. Let (a, a2,...) be a countable dense sequence in A. For each positive 
integer n, define fn by 


frlz,y) = f (aja nY); 
where 
jen = min{i: |x — a;| < |£ — rkl, 1<k<n}. 
In other words, aj, „ is the element in the ordered n-tuple (a1,...,@n) closest to 
x, with ties being broken by the ordering. For any Borel subset B of R, 


n 


fa (B) = (Ain x fy: (aiy) € BY), 


i=l 


where A; = {£: jz n = i}. Hypothesis (ii) shows that f; '(B) is a finite union of 
measurable rectangles and, hence, that fn is measurable. Hypothesis (i) shows 
that, for each (z,y), fn(z,y) > f(x,y) as n —> oo. Therefore, f is measur- 
able. O 


Theorem 10. [Fubini] Let (¥,G,u) and (©, H,v) be two o-finite measure 
spaces, and let p be an R-valued measurable function defined on the product 
measure space (Y, G, u) x (©, H,v). If 


[va x v) 


exists, then 


(9.5) oe [ p(z, y) v(dy) 


9.3. THE FUBINI THEOREM 131 


is a p-almost everywhere defined measurable function from (¥,G) to R, and 


(9.6) [eaux =f ( l o(a,y) (ay) p(de). 


PROOF. We give the proof for the case in which p is a finite measure. The ex- 
tension to o-finite measures is straightforward, using either the Fubini Theorem 
for sums (Corollary 10 of Chapter 6) or the Monotone Convergence Theorem. 

The theorem obviously holds when f is the indicator function of a measurable 
rectangle. The linearity of integration and the Monotone Convergence Theorem 
allow us to use the Sierpinski Class Theorem to extend the result to all indi- 
cator functions of measurable sets in G x H. We then can extend it to simple 
functions f by again using the linearity of the integral, and to arbitrary nonneg- 
ative measurable functions by using Lemma 13 of Chapter 2 and the Monotone 
Convergence Theorem. 

Now consider an arbitrary measurable R-valued function y. Write y = yt — 
p7. Since the result has been proved for nonnegative measurable functions, 


ho? RSI L Cf pt (x,y) v(dy) u(dz) 
fae d(u x v) =f o o= (ey) (ay) ) u(dz). 


Subtraction of the left sides gives the integral of y. Since this integral is assumed 
to exist, at least one of the two iterated integrals on the right must be finite. It 
follows that at least one of the two inside integrals on the right must be p-almost 
everywhere finite. Both of the inside integrals are measurable functions of zx. 
Therefore, their difference is a -almost everywhere defined measurable function 
of x. The desired conclusion now follows by subtracting the right sides. O 


and 


Remark 1. It is important to note carefully the hypotheses of the Fubini 
Theorem concerning the integrand: the function y must be measurable and 
its integral with respect to u x v must exist. In particular, one should avoid 
the temptation to use the iterated integral on the right side of (9.6) directly 
to determine whether f yd(u x v) exists (see Example 3 in the next section). 
However, if y is measurable, the Fubini Theorem does apply to the functions 
yt and y7, since the integrals of R` -valued measurable functions always exist. 
Thus, (9.6) can be used to determine the finiteness of the integrals of yt and 
p`, from which the existence of the integral of y can be determined. 


Remark 2. It is not asserted that the integral in (9.5) is defined for all z. An 
exceptional set of measure 0 is included in the statement because both positive 
and negative parts might have infinite integrals. 


132 9. STOCHASTIC INDEPENDENCE 


Remark 3. The Fubini Theorem does not generalize to arbitrary measure 
spaces. To construct a simple counterexample, let Q = Qə = [0,1] with the 
corresponding Borel o-fields, and let u, be Lebesgue measure and p2 counting 
measure. If f is the indicator function of the diagonal of the square 0; x Q3, 
then one of the iterated integrals in Fubini Theorem is 0 and the other is 1. 
This example also indicates that there are difficulties with our construction of 
product measure in the general setting, since we used the iterated integral in 
that construction. 


Problem 24. Compute the following iterated integral by interchanging the order 
of integration. Be sure to justify all the steps. 


ysin(2ry/z) 
i‘ J“ (3+1) — Caray 


Problem 25. Give an alternative proof of Corollary 20 of Chapter 4 using the 
Fubini Theorem. 


Problem 26. State the Fubini Theorem for the product of three o-finite measure 
spaces. Describe how the theorem you have stated follows from Problem 20, Propo- 
sition 8, and the Fubini theorem for the product of two o-finite measure spaces. 


The following two examples serve as warnings against trying to apply the 
Fubini Theorem without carefully checking that the hypotheses are satisfied. 


Example 3. Let Y = {1,2,3,...}, G = all subsets of Y, and p({i}) = 27°. 
Define a random variable X on (,G, u)? by X(i,j) = aij, where (aij) is the 
following infinite matrix: 


4 —4 0 0 0 
—4 16 -16 0 0 
0 -16 64 -64 0 
0 0 -64 256 —256 


To compute E(X), we try to sum the entries in the matrix (a;;u({i})u({7})), 
which looks like this: 


Note that the sum of the positive terms in this second matrix is oo, and the sum 
of the negative terms is —oo. As a consequence, the expectation of X does not 


9.4. EXPECTATIONS AND INDEPENDENCE 133 


exist. On the other hand, calculating the two iterated integrals in the Fubini 
Theorem is equivalent to summing the row sums and summing the column sums 
of this matrix. In both cases, the result is 1/2. 


Example 4. For the probability space (V,G,)* of the preceding example, 
define a random variable Y by Y (i,j) = bij, where (bij) is the infinite matrix 


4 -8 0 0 0 
0 16 —32 0 0 
0 0 64 -128 0 
0 0 0 256 —512 


For computing E(Y), the relevant matrix is 


1 -l 0 0 0 
0 1 —1 0 0 
0 0 1 -1 0 
OF 0 0 T =l 


The two iterated integrals in the Fubini Theorem both exist but are unequal; 
one of them equals 1 and the other equals 0. It follows from the Fubini Theorem 
that the expected value of X does not exist. 


9.4. Expectations and independence 


Here is a version of the Fubini Theorem for a probabilistic setting. 


Proposition 11. Let (X,Y) be an independent pair of R-valued random 
variables with finite expectation. Then E(XY) = E(X)E(Y), or equivalently, 
Cov( X,Y) = 0. 


* Problem 27. Prove the preceding proposition and then deduce the following corol- 
lary. 


Corollary 12. Let (Xı,..., Xn) be an independent sequence of random vari- 
ables. For each i, let X; take values in (V;,G;), let pi be a measurable function 
from (¥;,G;) to R, and suppose that E(|y; 0 X;|) < œ. Then 


E (Treo) = | | Elp: 0 Xi). 


i=1 


134 9. STOCHASTIC INDEPENDENCE 


From Proposition 11 above and Corollary 8 of Chapter 5 we conclude that if 
(X1,..., Xn) is independent and if each X;, has finite mean, then 


n n 
Var( ` Xp ba Var(X;,) . 
k=1 k=1 
Moreover, the assumption of independence can be replaced by that of pairwise 
independence. 

It is possible for Cov(X,Y), and therefore Corr( X,Y), to equal 0 even if 
(X,Y) is not independent. Nevertheless, Corr(X,Y) is often used as a rough 
measure of the dependence between two random variables. If Y is a constant 
multiple of X, then the correlation of X and Y is 1 or —1 according as the 
constant multiple is positive or negative. On the other hand, | Corr(X, Y)| may 
be less than 1 even if Y = yo X for some (nonrandom) function y. 


Problem 28. Provide an example of two dependent random variables X and Y 
whose correlation is 0. 


* Problem 29. Let (X,Y) be an independent pair of R-valued random variables. 
Show that E(X +Y) = E(X)+ E(Y) if either side of the equation makes sense. 
Give an example to show that this statement is not true if R is replaced by R. 


Problem 30. Let (X,Y) be an independent pair of R-valued random variables. 
Suppose that E(X*), E(Y*) < oo. Show that 


E((X+Y —E(X+Y)]*) = E([X — E(X)]*)+ E([Y — E(Y)]*) + 6 Var(X) Var(Y). 


9.5. Densities and independence 


The following proposition can be especially useful in computations with random 
vectors having independent components in case the components have densities 
with respect to 1-dimensional Lebesgue measure or counting measure. 


Proposition 13. Let X;, 1 < i < d, denote random variables on a common 
probability space, and suppose that for each i, the distribution of X; has density 
fi with respect to a o-finite measure ui. Then the random variables X,,...,Xa 
are independent if and only if the random vector (X,,...,Xa) has density 


(t1,.-.,@a) œ fi(z1) > fo(@2)----+ falta) 
with respect to p X +++ X Hd- 


PARTIAL PROOF. We will only consider the case d = 2. For the ‘only if’ part 
we assume that X, and X> are independent and intend to show that 


(9.7) 
P({w: (X1(w), X2(w)) € A}) = [ HEDRE (mı x f2)(d(z1, 22)) 


9.5. DENSITIES AND INDEPENDENCE 135 


for every set A € G1 X Go. 

We will prove (9.7) for A = A, x Ag, with Ai € G and Az E Go. Once we 
have done this, the rest is a straightforward application of the Sierpinski Class 
Theorem which is left to the reader. Since each f; is a density, it is measurable 
and nonnegative. Thus the product fi fo is nonnegative and measurable with 
respect to G, x Gy (proof?). By the Fubini Theorem, the right side of (9.7) 
equals 


J, ( E fala) ola) nal dts) ) pı (dz). 


The quantity fı(xı) does not depend on z2 and may therefore be taken out of 
the inside integral. The inside integral then no longer depends on xı, so it may 
be taken outside of the outer integral, leaving the product 


J Perda J A NNA 
Ao Aj 


By the definition of density, this expression equals 
P({w: Xo(w) € A2}) P({w: X 1(w) € A;}), 


which equals the left side of (9.7) by the definition of independence. 
We leave the proof of the ‘if’ part to the reader. O 


Problem 31. Complete the preceding proof for the case d = 2 by doing the two 
things mentioned as being left for the reader. 


Problem 32. Let (X1, X2) be an independent pair of exponentially distributed ran- 
dom variables, with parameters ai, a2, respectively. Use Proposition 13 to compute 
E\X, — X2|. Then use this answer, the answer to Problem 6, and the fact that 
lai — v2] = T1 V T2 — £1 A T2 to calculate E(X1 V X2). Finally, check this answer 
via a direct calculation. 


Problem 33. Let (X1, X2) be an independent pair of random variables each having 
a beta distribution with parameter pair (5,1). Find P({w: Xı (w) > [X2(w)]*}). 


The following result illustrates the usefulness of the Fubini Theorem in prob- 
ability theory, even in the absence of independence. 


Proposition 14. Let X = (X1, X2) be a random vector that takes values in 
a product space 
(U,G, u) = (Y, Gi, H1) x (Wo, G2, H2), 


where uı and u are o-finite measures. Suppose that the distribution of X has 
density f with respect to p. Then the distributions of X, and X2 have densities 


film) =f f (21,22) u2(dz2) and hed = f f (21, £2) yı (dz2) 


with respect to pı and u2, respectively. 


136 9. STOCHASTIC INDEPENDENCE 


PROOF. Let A be a member of G;. Then 


Plame AN Paea nE J ORCA IC tery 


AxWo 


By the Fubini Theorem, the integral equals 


f (S f (21,22) pada) (dzı). 


Thus, the density of X, is as claimed. The proof for Xə is similar. O 


The densities fı and fz are called marginal densities, although the term is 
misleading. A ‘marginal density’ is just a ‘density’; the adjective ‘marginal’ re- 
flects the fact that the density in question has arisen from integrating the density 
of a random vector. When densities under consideration are with respect to Le- 
besgue measure in Rt, the following result is useful—for instance, for calculating 
the probability that a random vector with a known distribution belongs to some 
particular set. 


Theorem 15. [Change of Variables] Let y be an R? -valued invertible contin- 
uously differentiable function defined on an open set U C RÊ. Let B be a Borel 
subset of U and let A = p™+ (B). If f is a measurable function from RÊ to R, 


then 
f sat= f opas, 
B A 


where J is the Jacobian determinant of the transformation p and Aĉ is d- 
dimensional Lebesgue measure. The two integrals are equal in the sense that 
if one exists, then both exist and are equal. 


Problem 34. Prove the preceding proposition. Hint: Use a theorem from advanced 
calculus to show that the proposition is true for continuous functions f that are 0 
off a bounded set. Then use the Monotone Convergence Theorem to show it is true 
for indicators of open rectangles. Extend to indicators of measurable rectangles by 
using the Sierpiński Class Theorem. 


9.6. Product probability measure: infinitely many factors 


Many theorems in probability begin with a statement such as “Let (Xn: n = 
1,2,...) be an infinite sequence of independent identically distributed random 
variables having common distribution Q.” Such a theorem would be vacuous for 
any Q for which there were no such sequence. More generally, one might want to 
drop the assumption of identical distributions while keeping the independence 
and specifying the distribution of each term in the sequence. The highpoint of 
this section, Corollary 17, says that such a specification is possible. 


9.6. PRODUCT PROBABILITY MEASURE: INFINITELY MANY FACTORS 137 


Let (Qn, Fn), n = 1,2,..., be an infinite sequence of measurable spaces, and 


let z 
Q= Q) Qn- 
n=l 


A measurable rectangle in Q is a set of the form 


Q An, 
n=1 


where for each n, An € Fn. Let R be the collection of measurable rectangles in 
Q, and let 

FoR 
The o-field F is called the infinite product of F,, F2,... and is denoted by 


K) Fn. 
n=l 


Theorem 16. Let ((Qn, Fn, Pn) n = 1,2,...) be a sequence of probability 
spaces. Let Q be the infinite product of the spaces Q,, and let F be the infinite 
product of the o-fields Fn. Then there exists a unique probability measure P on 
the measurable space (Q, F) such that 


(9.8) P (@ An = I P,(An) 


for events An € Fn, n=1,2,.... 


PROOF. For n = 1,2,... let Hn = o(Rn), where R,, is the collection of all 
measurable rectangles of the form 


(84) (8,2) 


k=n+1 


Thus, any member of Hn can be written in the form 
(9.9) B x Qy41 x Qn+2 > aera 
for some unique set 
n 
k=1 
Let 


ESU Tee: 
n=1 


Note that £ is a field, and that F = o(€). (The members of € are called cylinder 
sets, and € itself is the field of cylinder sets.) 


138 9. STOCHASTIC INDEPENDENCE 


We define P on € by defining P on each Hn. For A € Hn, written in the form 
(9.9), define 


P(A)= (P, x--- x Py) (B). 
A set A which is a member of Hn is also a member of Hn+ı with B x OQy41 


taking the place of B. Thus we need to check that the definition of P is consistent. 
We apply the associative law of Problem 20: 


(Py x +++ xX Pa+1)(B x On41) = (Pi x +++ & Pa) X Pa+1)(B x OQn41) 
= (Pi x -++ x Pa)(B)Pn+1 (Qn41) 
= (P, x --- x P,)(B). 


Thus, P has been consistently defined on the field € of cylinder sets. 

It is clear from the definition of P that (9.8) is satisfied for measurable rect- 
angles in each of the collections Rn. We have not yet defined P on all of R. 
However, any member of R can be written as the limit of members of Un Ran, 
since for any sequence (Aj, A2,...) such that An E Fn, n = 1,2,..., 


Q An = lim (Qa) ( Q 3] l 
n=l n=l n=m+1 


Thus, the Continuity of Measure Theorem implies that any extension of P to a 
probability measure on F will satisfy (9.8) on all of R. 

We turn our attention to the task of extending P to a probability measure on 
F. Since € is a field which generates F, it suffices, by the Extension Theorem 
and the companion Proposition 9 of Chapter 7, to show that the limit of a 
decreasing sequence (C1, C2,...) of members of € is nonempty if lim P(C,,) Æ 0. 
So, suppose that the limit equals € > 0. By inserting initial terms and repetitions 
into the sequence if necessary, we may assume that Cn € Hn for each n. We will 
complete the proof by identifying a point Y in lim Ch. 

Let Ynn(wi,---,;Wn) = Ic, (w1,we,...); and for 0 < m <n, let 


Yamai ag Wm) = 


I Yanlis Oa Pmt K * P a a] 
Qm41X%-XOn 


so that, in particular, Yo, = P(Cn). By the Fubini Theorem, each Ym,n is 
Gm-measurable, and 


(9.10) Yain: -sm-1 = | Yml- -1m1 6m) Pon (de) 
for0<m<n. 

For fixed m, the sequence (Ym nn = m,m + 1,m + 2,...) decreases to a 
nonnegative G,,-measurable random variable Ym, since the sequence (C1, C2,...) 


9.6. PRODUCT PROBABILITY MEASURE: INFINITELY MANY FACTORS 139 


is decreasing. The random variables Ym,n are all bounded by 1, so it follows from 
the Bounded Convergence Theorem and (9.10) that 


VAs Oa) = | Vo E EE E Glee ay (dye, 
By Theorem 9 (v) of Chapter 4, for m > 1 and each (w},...,wWm-_1i), there exists 
an Wm such that 
Yen i esa mni Wa) SS Y aA Weresesl mei): 
Thus, we may find a sequence Y1, Wo,..., such that for all m > 1, 
Yin (Wi, Y2,- -3 Ym-1: Ym) 2 Ym-1 (V1, P25- - -3 Pm-1)- 


Let Y = (Wi, Y2,- -). 
We now have that 


lc, (w) = Yran lVi e. Wn) 
> Yn(vi,---, Yn) 
> Yo = lim P(Ch) >€ >0 


for all n > 0. Since indicator functions can only take the values 0 and 1, it 
follows that Y € Chn for each n. Thus, lim Cn is nonempty as desired. O 


The probability measure P constructed in Theorem 16 is called the infinite 
product of P;, P2,... and is denoted by 


The probability space (Q, F, P) is the infinite product of the probability spaces 
(Qn, Fn, Pr). The construction of the product of an infinite sequence of probabil- 
ity measures does not generalize to infinite measures, or even to finite measures 
which are not probability measures. 


Corollary 17. Let (Qn, Fn, Pn), n = 1,2,..., be as in Theorem 16. There 
exists an independent sequence (X1, X2,...) such that for each n, Xn has dis- 
tribution Pn. 


PROOF. Let (Q, F, P) be the infinite product of the spaces (Qn, Fn, Pn), n = 
1,2,.... For n =1,2,..., and w = (w1,w2,...) E€ Q, define 


PE R 


A little thought convinces one that the sequence (X1, X2,...) has the desired 
properties. O 


140 9. STOCHASTIC INDEPENDENCE 


The random sequence (X1, Xo,...) described in the preceding proof is merely 
the identity function on an appropriate probability space, and so the target and 
the domain of this random sequence are the same. It should be emphasized that 
there are many natural situations in which the target and the domain are very 
different and yet the sequence is independent. 


Problem 35. Where in the proof of Theorem 16 did we make use of the hypothesis 
that the measures P, are probability measures? What happens when you try to 
construct (J, B,A)® where J is an interval in R and 4 is 1-dimensional Lebesgue 
measure? 


The following problem connects intuitive ideas about independence with the 
definitions we have given. 


Problem 36. Let (X,Y) be an independent pair of random variables, and let 
((X1, Y1), (Xe, Y2),... ) 


be an independent sequence of random vectors, each of which has the same distri- 
bution as (X,Y). (Such a sequence exists by Theorem 16.) Let A and B be Borel 
subsets of R and assume that P({w: X(w) € A}) > 0. Define 


Sn(w) = X Taxa (Xilw),¥iw)) and Mn(w) = Ñ (Xi w)). 


i=1 


Prove that for all £ > 0, 


; Sn (w) , oe 
Think of the sequence (X1, Yi), (X2, Y2),... as representing a sequence of pairs of 


measurements which are done on the outcomes of independent repetitions of an 
experiment, and give an interpretation of (9.11) in terms of your intuitive ideas 
about the meaning of independence. 


From (9.8) we see that for any € > 0, there exists m such that 


co m oO 
r( [8a] A Qa, x & 4] oe 
n=1 n=1 n=m-+1 
Thus all measurable rectangles C in the product o-field can be approximated by 
cylinder sets D in the sense that P(C A D) can be made arbitrarily small by 
appropriate choices of D. More is said in the following lemma. 


Lemma 18. Let 


(Q, F, P) = GIT a Pn), 
n=1 
where each (Nn, Fn, Pn) is a probability space. Then for any C E€ F and any 
€ > 0, there exists a cylinder set D for which P(C A D) <e. 


9.7. THE BOREL-CANTELLI LEMMA AND INDEPENDENT SEQUENCES 141 


Problem 37. Prove the preceding lemma. Hint: Use the Sierpiński Class Theorem. 


9.7. The Borel-Cantelli Lemma and independent sequences 


As we have already seen, an independent sequence of events is pairwise uncor- 
related, so the Borel-Cantelli Lemma applies. 


Problem 38. State the Borel-Cantelli Lemma for independent sequences of events. 
Provide a new proof based on the formula in Problem 4. 


The next example illustrates an important way in which the Borel-Cantelli 
Lemma can be used to analyze the asymptotic behavior of independent sequences 
of random variables. 


Example 5. Let (X1, X2,...) be an independent sequence of random vari- 
ables having the standard exponential distribution with mean à. We will de- 
scribe rather precisely the large values of Xn as n > oo by proving that, with 
probability 1, 

n 


lims z 1 
im su = 
ae Alogn 


The two steps are to first show that the limit supremum is greater than or equal 
to 1 a.s., and then to prove that the limit supremum is a.s. less than or equal to 
1+ for all 6 > 0. 

For each n, let An = {w: Xp(w) > Alogn}. In order to apply the Borel- 
Cantelli Lemma, we calculate 


S PAn) = S| exp(— logn) = S-(1/n) =o: 
Therefore, P(lim sup An) = 1 and thus 
P({w: lim sup Se ziel: 
Now fix 6 > 0. For each n, let Bn = {w: Xn(w) > (1+ 6)Alogn}). Now we 
obtain a convergent series: 
X P(Bn) = = exp(-(1 + 6) logn) = XC (1/n' t?) < oo. 
By the Borel Lemma, P(lim sup Bn) = 0 and thus 


P({w: lim sup Ant) 


ne <14+6})=1. 


Now limsup X;,/(Alogn) < 1 a.s. follows by letting 6 — 0 along a countable 
sequence. 


142 9. STOCHASTIC INDEPENDENCE 


Problem 39. Let X, be as in the preceding example, and let 
Ya = max{ X1, X2,. 3 AnI 
for each n. Prove without using probability theory that 


; Y, ; 
lim sup ——— = lim sup a 


n=œ Alogn n=>œ Alogn 


Example 6. This is a continuation of the preceding example. We will prove 
that 


lim NE = ] a.s. 
By the preceding problem, it is enough to prove that for each 6 > 0, 


n 


Alogn 
Let Cn = {w: Y,(w) < (1 — JA logn}. Then 
S P(Cn) = 5 (1 - exp(-(1 — 6) log n))” 
=. we ae mo)? 
< S_(exp(-n779)))" = ven < OO. 


It follows from the Borel Lemma that P(lim sup Cn) = 0, and the desired result 
follows as in the previous example by letting 6 go to zero along a countable 


>l—das. 


lim inf 


sequence. 


Problem 40. For the sequence (X1, X2,...) of Example 5, prove that for each 
decreasing sequence (bi, b2,...) of positive numbers, either 


ae. To eee 
liminf — =ooas. or liminf = =Oas. 
n= OO n lt CO n 


* Problem 41. Let X be a normally distributed random variable having mean 0 and 
standard deviation ø. Prove that, as r — ov, 


oO 


EVT 


(9.12) P({w: X(w) >r} ~ exp(—x°/20°). 


* Problem 42. Let (X1, X2,...) be an independent sequence of normally distributed 
random variables with mean 0 and standard deviation ø. Find an increasing se- 
quence (a1,a2,...) such that lim sup Xn /an = 1 a.s. Let 


Yn = max Xę. 
1<k<n 


Is it true that lim Y,/an = 1 a.s.? Hint: Use the preceding exercise. 


9.8. ORDER STATISTICS 143 


Problem 43. For z > oo, find a simple asymptotic formula for P({w: X(w) > z}), 
where X has a gamma distribution. 


Problem 44. Let (Xi, X2,...) be an independent sequence of identically dis- 
tributed gamma random variables. Construct an increasing sequence (a1, @2,...) 
such that limsup X,/an = las. Let Yp = max{X1,...,Xn}. Is it true that 
lim Y;,/@n = 1 a.s.? Hint: Use the preceding exercise. 


* Problem 45. Let (X1,X2,...) be an independent sequence of random variables, 
where for each n, Xn is uniformly distributed on [0, n]. Calculate P({w: X,(w) 9 
oo as n — co}). 


Problem 46. Let (X1,X2,...) be an independent sequence of R?-valued random 
variables, where for each n, Xn is uniformly distributed on [—n,n] x [—n, n]. Cal- 
culate P({w: ||Xn(w)|]| > co as n > oo}), where ||- || denotes the usual Euclidean 
norm in R’. 


9.8. + Order statistics 


Suppose that an independent sequence of five random numbers—say, rather un- 
typically, 1/3, 1/V2, 4/5, 1/7, 3/7—are drawn according to the uniform distribu- 
tion on (0,1). They can be arranged in increasing order: 1/7, 1/3, 3/7, 1/V2, 4/5. 
When so arranged these are, according to a definition to be given shortly, called 
the first through fifth ‘order statistics’. Of course, if someone gives you the order 
statistics only, you have not obtained the full information about the result of 
the experiment; because the same order statistics typically come from 5! = 120 
different members of R3. However, for many purposes the information contained 
in the order statistics is the important information. The following exercise will 
help one develop facility with some of the technicalities involved in the definition 
of order statistics. 


* Problem 47. For x € RÊ, use the notation z; for the iP coordinate of x. Let x? 
be the function from R? to R? defined by 


(9.13) [xÀ (z); = min{v € R: f{i: zi < v} > j}. 

Discuss the appearance of minimum rather than just infimum in this definition. 
Say why 

(9.14) xÀ E PaE foris j 

and 

(9.15) Ha: xÀ (2); = v} = He: zi = v} 


for each x € R? and v € R. Describe (and, in case d = 2, draw) the image of 
x®. Prove that x is continuous, hence measurable. Calculate the cardinality 
of (x'?)—!({y}) for all y € RÊ. 


144 9. STOCHASTIC INDEPENDENCE 


Definition 19. Let X = (X1,..., Xa) be a random vector consisting of d R- 
valued components, and let Y = x(® (X). Then for j = 1,...,d, Y; is called the 
jt? order statistic of X. If the components of X form an independent sequence 
and are identically distributed according to a distribution Q, then we call Y; the 


j* order statistic based on d observations of Q. 


The following proposition shows that order statistics based on observations 
of the uniform distribution can be used to analyze order statistics based on 
observations of arbitrary distributions on R. 


Proposition 20. Let G; be the distribution function of the j"" order statistic 
based on d observations of the uniform distribution on [0,1]. Let Q denote a 
distribution on R with corresponding distribution function F. Then Gjo F is the 
distribution function of the j*" order statistic based on d observations of Q. 


Problem 48. Prove the preceding proposition. Generalize to obtain a result for 
the distribution of the random vector whose components are the order statistics 
based on d observations of Q. 


* Problem 49. Let Yı < Yo < --- < Yq be order statistics based on d observations of 
the uniform distribution on [0,1]. Calculate the density of the vector (¥i,..., Ya). 
(One assumes the problem is requesting the density with respect to d-dimensional 
Lebesgue measure, since there is no indication of some other measure.) 


Problem 50. Let Yi < Yo < --: < Yq be order statistics based on d observations 
of a distribution Q on R. Assume that Q has a continuous distribution function. 
Prove that the distribution of the vector Y = (Y1,..., Ya) on Rf is the restriction 


of ; 
d! &) Q 
i=l 


to the set {y € Rt: yı <y <- < Ya}. Conclude that the distribution of Y has 
a density with respect to d-dimensional Lebesgue measure if Q has a density with 
respect to 1-dimensional Lebesgue measure. 


* Problem 51. Let (Xi, X2,...) be an independent sequence of random variables, 
each of which is uniformly distributed on the interval [0,1]. Let 


N =sup{k > 1: Xk < Xk-1 L< < Xi. 


Find the distribution and mean of N. Let Z = Xn. Find the density of Z. 


The closed interval with endpoints equal to the first and the n*® order statistics 
of a sequence of n R-valued random variables is the smallest closed interval that 
contains all n values. We will extend this idea to R?¢-valued random variables. 

A set C C R? is convez if the line segment containing each pair of points in A 
is a subset of A. The intersection of any collection of closed convex sets is easily 
seen to be closed and convex. Therefore, there is for each B C RÊ, a smallest 


9.9. SOME NEW DISTRIBUTIONS INVOLVING INDEPENDENCE 145 


closed convex set containing B. It is called the conver hull of B and is denoted by 
conv(B). For (X1, X2,...), an independent sequence of identically distributed 
random vectors in R?, one studies the sequence conv({X;}), conv({X1, X2}), 
conv({X1, X2, X3}), ... of random sets in R?. In R this amounts to studying 
the sequence of intervals with endpoints equal to the largest and smallest order 
statistics. 


* Problem 52. Let Y be the union of the z- and y-axes in R’, and let G be the 
o-field of sets of the form A U B, where A and B are Borel subsets of the zx- 
and y-axes, respectively. For such sets A U B, define p(AU B) = X(A) + A(B), 
where à denotes 1-dimensional Lebesgue measure. Define a probability measure 
Q by (u,v) = 1/2 if either v = 0 and 0 < u < 1 or u = 0 and 0 < v < 1 
Let (X1, X2, X3) be independent random variables, each having the distribution 
Q. Calculate the mean area of conv({X1, X2, X3}). Hint: It is not necessary to 
calculate the distribution function of the area. 


9.9. + Some new distributions involving independence 


The Riemann zeta function is relevant to the next exercise: for z > 1, let 


oO 


1 
ye . 


c(z) = 


n=l 


This definition is meaningful for complex z whose real part is greater than 1, 
but we will be primarily interested in real z > 1. 


* Problem 53. [Zeta Distributions] Let 
Q = {1,2,3,...} 


and let F be the o-field consisting of all subsets of Q. For each real z > 1, P}, 
defined by 


P.o) = r we, 


is a probability measure on the measurable space (N, F). We can also think of P, 
as being the distribution of the random variable X defined by X(w) = w. Evaluate 
its mean and variance in terms of the Riemann zeta function. For each m calculate 
the probability of the event 


{w: w/m is an integer}. 
Calculate the limit of this probability as z N 1. 
Problem 54. This exercise is a continuation of the previous exercise. For each 


w= ll pr?) 
Pp 


prime 


positive integer w, write 


146 


9. STOCHASTIC INDEPENDENCE 


where X,(w) € {0,1,2,...} denotes the power of the prime p in the prime fac- 
torization of w. For the probability space (Q, F, P.), prove that the sequence 
(Xp: p prime) is independent. Also calculate and name the distribution of each 
Xp. Notice that an independent sequence of random variables has been defined 
in a natural manner on a probability space that is not a product space. Discuss 
the pros and cons of constructing, on a computer say, a random natural number 
according to a zeta distribution by constructing the powers of the primes in its 
prime factorization. 


Problem 55. Let X = (Xj, X2) be an independent pair of normally distributed 
random variables having mean 0 and standard deviation ø. Calculate the distri- 
butions of the Euclidean norm ||X|| and the polar coordinate angle arg X. Show 
that this pair of random variables is independent. 


Problem 56. Let X be an R-valued random variable with a distribution that has a 
density and is symmetric about 0. Suppose that the distribution of X? is gamma 
with parameters y,a. Find the density of X. In particular, find the value of y for 
which X is normally distributed. 


Problem 57. Let U = (X,Y) be an independent pair of identically symmetrically 
distributed random variables, each of whose squares has the same gamma distribu- 
tion. Calculate the distributions of ||U|| and arg U. Show that this pair of random 
variables is independent. 


CHAPTER 10 


Sums of Independent 
Random Variables 


In Example 3 of Chapter 1 we calculated the probability that exactly k heads 
appear in n flips of a fair coin. In view of the construction of that example and 
the definition of independence given in the preceding chapter, we see that what 
we calculated is the distribution of the sum of n independent random variables, 
each of which has the Bernoulli distribution with parameter p = 1/2. Sums of 
independent random variables constitute a major theme in probability theory. 

From a theoretical point of view, this chapter contains essentially only two 
main results: some formulas for the distribution and distribution function of the 
sum of two independent R?-valued or R’ -valued random variables (Proposition 3 
and Proposition 4), and a formula for the probability generating function of 
the sum of two independent Z-valued random variables (Theorem 5). While 
the proofs of these results are relatively simple, some practice is required in 
their application. The rest of the chapter is devoted to examples intended to 
provide such practice. These examples include some new families of distributions 
(multinomial, negative binomial, Dirichlet). We also introduce a new concept 
(‘infinite divisibility’) that will be central to much of the theory in Part 3. At the 
end of the chapter, we explore sums of independent random variables in settings 
other than Rf and R`. 


10.1. Convolutions of distributions 


Let (X,Y) be an independent pair of R¢- or R' -valued random variables with 
distributions Q and R, respectively. Then the distribution of the random variable 
(X,Y) is Q x R, and therefore the distribution of the sum X +Y is given by 


P({w: X(w) + Y(w) € B}) = (Q x R)({(z,y): r +y € B}). 


Let us introduce symbolism to describe the distribution of X + Y in terms of 
the distributions of X and Y. 


148 10. SUMS OF INDEPENDENT RANDOM VARIABLES 


Definition 1. Let Q and R be two distributions on R? or R`. The convolu- 
tion of Q and R, written Q * R, is the distribution on (R, B) defined by 


(Q * R)(B) = (Q x R)({(z,y): z +y € B}). 


The shaded region in Figure 10.1 indicates the region C in R? corresponding 
to the event {w: X(w) +Y (w) € [a,b]}. That is, (Q * R)([a,b]) = (Q x R)(C). 
It should be emphasized that Q * R is defined whether or not (X,Y) is an 
independent pair but the conclusion that it is the distribution of X +Y is based 
upon the assumption of independence. 


FIGURE 10.1. The region in R? corresponding to {w: X (w) + 
Y (w) € [a,b]} 


Proposition 2. Convolution is commutative and associative. The delta dis- 
tribution at O is the unique identity for convolution. 


PRooF. The commutativity and associativity of convolution follow immedi- 
ately from the corresponding properties for addition of random variables. The 
random variable which is identically 0 is an identity for the addition of random 
variables, so its distribution ôo is an identity for convolution. If Ro is any identity 
for convolution, then 


ôo = Ro *d0 = Ro, 


so the identity is unique. O 


10.1. CONVOLUTIONS OF DISTRIBUTIONS 149 


Problem 1. Why do you think the uniqueness of the additive identity for R was 
not used in the preceding proof to show uniqueness of the identity for convolution? 
Why is there no mention of an inverse for convolution? 


In view of Proposition 2 we can speak of the convolution of a finite number of 
distributions without regard to order or grouping. The convolution of n copies of 
Q is denoted by Q*”, where, of course, Q*° denotes the identity for convolution— 
namely, the delta distribution at 0. The convolution terminology and notation 
is also used for distribution functions. Thus, if F} and F> are the distribution 
functions corresponding to Qı and Q2 respectively, we write Fı * F> for the 
distribution function corresponding to Qı * Q2. And the notation F*” is also 
used where appropriate. 

If Q and P are distributions such that P = Q*” for some positive integer n, 
then Q is called an n+? convolution root of P, and we write 


On? =. 


It will be seen in Part 3 that an important class of distributions consists of those 
that have an n*® convolution root for every positive integer n. Such distributions 
are called infinitely divisible. 


Example 1. Let Qı,Q2 be Poisson distributions with means \;,A2. We 
wish to calculate Qı * Q2. Since random variables with a Poisson distribution 
are nonnegative integer-valued, so are their sums. Thus, it is enough to compute 
(Qı * Q2)({x}) for nonnegative integers z. 


(Qi * Q2)({z}) = (Qi x Q2) (z1, £2): £1 + z2 = 1}) 
S Q2({z — 21}) Qlaz} 


zı =0 


5 Ae ee A 
= ay r 
cree (x-z)! ay! 
Combine the exponentials, multiply and divide by z!, and apply the Binomial 
Theorem in this last expression to obtain 


(Ay + A2)7e7 (A142) 
x! 


(Qi * Qa)({z}) = 


thus identifying Qı * Q2 as Poisson with mean \; + Az. A simple inductive 
argument implies that the convolution of n Poisson distributions with respective 
means Aj,...,An is Poisson with mean A; +-+- + An. It follows that for any 
à > 0, the Poisson distribution with mean \/n is an nt? convolution root of 
the Poisson distribution with mean A, so every Poisson distribution is infinitely 
divisible. We will see in Section 3 that each Poisson distribution has only one 
nt? convolution root. 


3 


150 10. SUMS OF INDEPENDENT RANDOM VARIABLES 


Problem 2. Calculate and identify by name Q*” where Q is Bernoulli with param- 
eter p. 


The notation B — x for the set {y — x: y € B} will be used in the following 
consequence of the Fubini Theorem, a consequence that does not apply to R` 
where subtraction is not universally defined. The roles of Qı and Qə in the 
statement may be reversed because, according to Proposition 2, convolution is a 
commutative operation. 


Proposition 3. Let Qı, Q2 be distributions on (Rt, B). Then, for B € B, 


diay (Q1 *Q2)(B) = J E E J Qi(B ~ y) Qaldy). 


Suppose further that dQ2/d\¢ = fz, where AŻ is Lebesgue measure on R?. Then 
d(Q1 * Q2)/d\4 exists and 


(10.2) AO AD (0) = f fale -y Qdo) 
for \*-almost every x. If also dQ, /dd? = fı, then 
(10.3) Mor a) f fale- hl) Ma) 


for \¢-almost every z. 
PROOF. Let A = {(z1, £2): 41 + Z2 E€ B}. Then A is measurable since the 


function (11,12) ~ 2, + T2 is continuous. By the definition of convolution and 
the Fubini Theorem, 


(Qy *Qo)(B) = J I4 d(Qi x Q2) 
= f ( f tales, 2) Qa(ara)) Qı (dz) 
= f Q(B - 21) Qian), 


giving the first equality in (10.1). 
Suppose dQ2/d = fz and set 


f(x) = J E 


The function (z, y) ~ f2(x — y), being the composition of the measurable func- 
tions (x,y) ~ x — y and fo, is measurable. So by the Fubini and Change of 


10.1. CONVOLUTIONS OF DISTRIBUTIONS 151 


Variables Theorems, 


[rate ff ( | fale —v) Qs(dy) \ (de) 
>f ( f re-s) xt(da) ) Qı (dy) 


= ia ba fo(u) a4(du) Qi (dy) 
= J Q(B — y) Q1 (dy) = (Qi * Q2)(B) 


for every B € B. Hence, f = d(Qi * Q2)/dàê. 
In case the other density dQı/dàf also exists, (10.3) follows from (10.2) in 
combination with Proposition 19 of Chapter 8. 0O 


Problem 3. Where does the proof of Proposition 3 break down if Af is replaced by 
an arbitrary Radon measure on Rê? 


In case, Q; and Qə have densities with respect to Lebesgue measure and thus 
(10.3) holds, the notation fı * fz is often used for d(Q; * Q2)/dàf, although, 
in case d = 1, this notation is somewhat at variance with the notation Fi * F> 
for the distribution function corresponding to Qı * Q2, where Fy and F> are the 
distribution functions of Qı and Q». But since a distribution function can never 
be a probability density function, there is no real danger of confusion. 

In the 1-dimensional case (10.1) can be written in terms of Fı and F; in a 
manner that is valid even in the R” setting. 


Proposition 4. Let F,, Fy be distribution functions for R or for R”. Then, 
forz ER, 


CO 


Fı (x — y) dFz (y). 


(Fi * F2)(x) = J. F(x — y) dF, (y) = J 


Problem 4. Prove the preceding proposition. 


Problem 5. Find the density of the sum of two independent normally distributed 
random variables having means yi and po and standard deviations c1 and o2 
respectively. 


Problem 6. Use the preceding problem to show that every normal distribution is 
infinitely divisible. 


Problem 7. Find the continuous density of the sum of two independent random 
variables each of which is uniformly distributed on the interval (0, 1). 


152 10. SUMS OF INDEPENDENT RANDOM VARIABLES 


Problem 8. Show that the convolution of two not necessarily identical densities of 
Cauchy type is of Cauchy type. 


Problem 9. Suppose that (Xi, X2) is an independent pair of random variables 
distributed according to gamma distributions with parameters (a, y1) and (a, 72). 
Show that X1 + X2 is gamma distributed with parameter (a, y1 + 72). 


Problem 10. Let F(z) = 1 — e`? for x > 0 and F(x) = 0 otherwise. Represent 
F*"(x) for n a positive integer and z > 0 as a single definite integral on an 
appropriate interval. Relate your answer to the preceding exercise. 


Remark 1. Convolutions can be defined more generally than has been done 
in this section. Except for the places where Aĉ is involved, the space R@ can be 
replaced in this section by any space on which an operation analogous to sums 
of vectors can be defined, an operation that is commutative and associative and 
has the properties that an identity exists and every element has an inverse. Such 
a structure is a commutative group. To rigorously speak of sums of random vari- 
ables having values in a commutative group, a measurable structure consistent 
with the group operations should be imposed on the group. For the next two 
problems the usual Borel structure will suffice even though the operation is not 
standard addition, but rather an operation appropriate for viewing (—7, 7] as 
representing the set of all rotations in R? about the origin. 


* Problem 11. Consider the binary operation © on (—7,7] defined by 
a@®b=a+t+b—-2n[(a+b—7)/(27)]. 


In other words, a @b is obtained by adding a and b and then adding or subtracting 
the appropriate multiple of 27 so that the result lies in (—7, r]. Corresponding to 
this operation calculate Q * R, where Q assigns probability 4 to each member of 
{—5,0, 5,7} and R assigns probability i to each member of {— 45,0, za i 


Problem 12. For the structure of the preceding problem calculate Q * Q, where 
Q = 360+ įr + ZA with ôa denoting the delta distribution at a and À denoting 
Lebesgue measure on (—7,7]. More generally, calculate Q*” for n € Z*. 


10.2. Multinomial distributions 


Let e;, i = 1,...,d, be the standard basis vectors in R. A distribution on 
R? that is supported by the set {e,...,ea} is called a multivariate Bernoulli 
distribution. The same adjective is used for any random vector having such a 
distribution. 


10.3. GENERATING FUNCTIONS AND SUMS 153 


Problem 13. [Multinomial Distributions] Let Q be a multivariate Bernoulli distri- 
bution on R*, with p; = Q({e:}), i =1,...,d. If (X1, Xo,..., Xn) is an indepen- 
dent sequence of random vectors, each having distribution Q, then the distribution 
Rof X,+---+Xy is called the multinomial distribution with parameters p1,..., pa, 
and n. Show, for nonnegative integers 71,...,Za, that 


Ti 
d Pi ; d 
! 2 D oo 
n: BES Ti! if ee Ti n 


0 otherwise . 


Problem 14. Describe the connection between multivariate Bernoulli distributions 
with d = 2 and Bernoulli distributions, and also the relationship between multino- 
mial distributions with d = 2 and binomial distributions. 


Problem 15. Describe the relationship between the multinomial distribution and 
the experiment of placing n balls into d urns according to certain probabilities. 


Problem 16. Describe the relationship between the multinomial distribution and 
the experiment of throwing a possibly unbalanced d-faced die n times. 


* Problem 17. Fix p € [0,1] and let (X1, X2,...) be a (not necessarily independent) 
sequence of random variables, with X, being binomially distributed with param- 
eters p and n. Define Xo = 0. Let N be Poisson with mean A and assume that 
N is independent of the sequence (X1, X2,...). Show that the distribution of the 
random variable 


wW ~a XNn(w)(w) 


is Poisson with mean på. 


Problem 18. Fix pi,...,pq nonnegative with sum 1. Let Z = (Z1, Z1, Z3,...) be 
an independent sequence of R?-valued random vectors, with Zn being multinomi- 


ally distributed with parameters pi,...,pq, and n. Let N be Poisson with mean 
A and assume that N is independent of the sequence Z. Show that the random 
vector 


w ~> Zn(w)(w) 


has independent coordinates, and that the 7" coordinate is Poisson with mean Api. 
Use the result of Example 1 as a partial check of this conclusion. 


10.3. Probability generating functions and sums in Z 


Convolutions of a large number of distributions, or even of just two, are in 
many cases difficult to calculate. The following result says that the probability 
generating function of a convolution may be easy to calculate. 


Theorem 5. For i = 1,2, let p; be the probability generating function of a 
distribution Q; supported by Z. Then the probability generating function of 


Qi * Q2 is piper. 


154 10. SUMS OF INDEPENDENT RANDOM VARIABLES 


PROOF. Let Xı and Xə be independent random variables having distributions 
Qı and Q2. In view of Proposition 11 of Chapter 9, the probability generating 
function of Qı * Qe is given by 


E tie?) =P (soe? = Bee (6? ens pal. a 


By Problem 30 of Chapter 5, the probability generating function of a Poisson 
distribution with mean A is p,(s) = e~*'-5). We see that 


Pr14+A2 = PA Pro 


and 

Pa = PX) n 
for each positive integer n, two facts that were already discussed in another form 
in Example 1. 

In general, probability generating functions can be useful for determining 
whether or not a distribution on Z` has an nt? convolution root. The require- 
ment that the power series of a probability generating function have nonnegative 
coefficients implies that only a positive root of a probability generating function 
can itself be a probability generating function. It follows that a distribution on 
Z can have at most one nt? convolution root supported by Z`. It turns out 
that it can have at most one nt? convolution root of any sort, and, in certain 
cases, there is such a convolution root that is not supported by Z . The full 
story is worked out in the following problem. 


Problem 19. Let P be a distribution supported by Z and suppose that P = Q*". 
Prove that Q is uniquely determined by P and is supported by {0 + k/n,1 + 
k/n,...,co}, where k is the smallest integer such that P({k}) > 0. Hint: First 
consider the case k = 0. For that case show that Q is supported by R”, then show 
that Q({0}) > 0, and finally show that Q is supported by Z`. 


Problem 20. Let (X1,..., Xn) and (Y1,..., Yn) each be independent sequences of 


Z-valued random variables. Suppose that the random variables Xj,...,Xn all 
have distribution Q and that the random variables ¥i,..., Yn all have distribution 
R. If 


Xito +X, and Yi+-:-+ Yn 


have the same distribution, what is the relationship between Q and R? 


* Problem 21. Let P be the distribution defined by P({0}) = P({2}) = 1/4 and 
P({1}) = 1/2. Show that P has a second convolution root but no third convolution 
root. 


We conclude this section by introducing a new family of distributions on Z*. 
We will see that the distributions in this family are infinitely divisible. 

It will be convenient to define some new notation. The product r(x — 1)(z — 
2)...(2£ —k +1) arises sufficiently often to deserve a name. It is called a falling 


10.3. GENERATING FUNCTIONS AND SUMS 155 


factorial, and we denote it by (x)x. Here x is any real number and k is a non- 
negative integer; (x)s = 1, consistent with the convention that empty products 
equal 1. (A variety of notations are used by other authors for falling factorials.) 
It is also convenient to define the rising factorial: (x); =2r(r44+1)...(4+k-1). 
Note that 


(z)f = (z +k- 1)} 


for all real numbers z and nonnegative integers k. 


Problem 22. Let 0 < p < 1. Use the Binomial Theorem to show that 


Soa pyr i (—p)* =1 
k=0 


for every real number r. Also show that the summands in the above expression 
are equal to 


ae); 
(1 — p) Te ’ 


k =0,1,2,.... 


Problem 23. [Negative Binomial Distributions] Let 0 < p < 1 andr > 0. Use the 
preceding exercise to show that 


k~ (1 ~ py Diy, keZt, 
is the density with respect to counting measure of a probability distribution; it is 
called the negative binomial distribution with parameters p and r. Show that the 
probability generating function of this distribution is s ~> [(1 — p)/(1 — ps)]". Also 
calculate the mean and variance of each negative binomial distribution. For which 
r is a negative binomial distribution a geometric distribution? 


Problem 24. Use probability generating functions to show that the sum of n in- 
dependent geometrically distributed random variables with the same mean has a 
negative binomial distribution, and describe the relation of the parameters of the 
negative binomial distribution to n and the parameter of the geometric distribu- 
tion. Calculate the distribution of the sum of two independent random variables 
having negative binomial distributions with the same p but possibly different r’s, 
where p and r have the same meaning as in the preceding problem. Is it meaningful 
to consider r = 0, and, if so, do your calculations encompass that case? 


Problem 25. Prove that geometric distributions and, more generally, negative bi- 
nomial distributions are infinitely divisible. 


Problem 26. Show that binomial distributions are not infinitely divisible. 


156 10. SUMS OF INDEPENDENT RANDOM VARIABLES 


10.4. ł Dirichlet distributions 


In this section we discuss a family of distributions that are important in statistics. 
We warm up with an exercise. 


Problem 27. Let Xı and Xz be as in Problem 9. Show that X1/(X1 + X2) has 
a beta distribution. Calculate its parameters. Prove that the distribution of the 
random vector (X1,X2)/(X1 + X2) is absolutely continuous with respect to 1- 
dimensional Lebesgue measure on the line zı + £2 = 1 and calculate its derivative 
with respect to À. (See also the following example.) 


Example 2. Let d be an integer greater than 1. Let X,...,Xq be random 
variables distributed according to gamma distributions with corresponding pa- 
rameters (a, yi), and suppose that the sequence (Xj,..., Xa) is independent. Let 
us calculate the distribution of the random vector 


1 


Yare Y gj 
(Y1, , a) EE 


Ee AN T 


It is clear that its distribution is supported by the intersection T of the (d — 1)- 
dimensional hyperplane yı + --- + ya = 1 and the orthant {(y1,..-, Ya): Yi > 
0 for each i}. 

For A a Borel subset of T, let 


1 
Betr) T ee E€ A}. 


By Proposition 13 of Chapter 9 we can write 


d P ae E 
(Xılw),..., Xalw)) ) | agi Ee 8 
P {y: Ree € A} = es 01532 Ta), 
( X1(w) + +++ + Xa(w) pll P(r) 

where d(z1,...,2q) indicates that the integration is with respect to Lebesgue 

measure in Rt. We make a change of variables: 
Ti : 

S=a2,+---+2zq and y= ae l<i<d-1. 
The Jacobian of (x1,..., 24) with respect to (y1,---, ya—1,8) is s271. Thus, with 


y=nt---+7a and 


C= {(y1,---,Ya-1): (Y1, --Yd-1; 1 — Y1 — + — ya-1) E A}, 


10.4. DIRICHLET DISTRIBUTIONS 157 


the above integral equals 


-1 d-1 y;-1 
Ea (1 — yy — +--+ — Yd-1) t 1 y 
alas) Tte 08 M =+ d(yi,..-,Yd-1, 8) 
Tad ia) (ya) lI T (y:) 
raea Fy 
= ueta Aou 2 dud(y1,---;Ya—1) 
ip I (ya) II T(x) 
—1 d-1 zj 
(1 — y eee Ya-1) 7! Yi 
=T J as -7 =e caine Se CU yeas Vaan) 
g c F(a) a TO) 


Thus we have a density for (X4,..., Xa-1)/(Xı +--: + Xa) with respect to 
Lebesgue measure on 


S = {(y1,---,Ya-1): yi > 0 for each i and yı +--+ + ya-1 < 1}. 


The function yg = 1 — yı —---— ya_1 gives a natural one-to-one correspondence 
between S and T. This correspondence maps C onto A. Also, the Lebesgue 
measure of a subset of S equals 1/\/d times the Lebesgue measure of its image 
in T under this map (proof?). Therefore, with respect to Lebesgue measure on 
T, the random vector (Y;,..., Yq) has the density 


yi—l 


1 ay; 
(10.4) rt) I TF 


for y € T, where y = %1 +: + Ya. 


Problem 28. Explain how calculations in the preceding example show that Xı + 
----+ Xa is independent of the random vector (Y1,..., Ya). 


Problem 29. It is apparent that the expression in (10.4) factors as a product of 
terms fi(y:), i= 1,...,d. Can we conclude from Proposition 13 of Chapter 9 that 
the sequence (¥1,..., Y4) is independent? Why or why not? 


* Problem 30. [Dirichlet Distributions] Let T be the subset of R? defined in Exam- 
ple 2, and let Y = (Yi,..., Ya) be a T-valued random vector with the density given 
in (10.4). We say that Y has a Dirichlet distribution with parameters y1,..., Ya. 
Compute the mean vector and covariance matrix of Y. Compute the variance of 
Yı +--+ Ya and use your answer to show that the determinant of the covariance 
matrix of Y is zero. 


Problem 31. Let Y = (¥1,..., Ya) be as in the preceding exercise. Choose a par- 
tition {K1,..., Km}, m > 1 of the set {1,...,d}, and let Z be the R” -valued 
random vector with coordinates 

i ef 


JEK; 


Show that Z has a Dirichlet distribution and find the corresponding parameters. 


158 10. SUMS OF INDEPENDENT RANDOM VARIABLES 


Problem 32. Let Y = (¥1,..., Ya) be as in the preceding exercise. Choose integers 
1< ji < jo <- < je <d, and let Y be the R*-valued random variable with 
components Y; = Y;,,1=1,...,k. Define Z to be the R*-valued random variable 
obtained by dividing Y by the sum of its components. Show that Z has a Dirichlet 
distribution and find the corresponding parameters. 


The following theorem describes how a Dirichlet distribution arises in connec- 
tion with order statistics. 


Theorem 6. Let Yı < Yo < --- < Yq be order statistics based on d obser- 
vations of the uniform distribution on the unit interval [0,1]. Set Yo = 0 and 
Ya41 = 1. Then the distribution of the random vector 


((¥% — Yo), Y2 = Y1), , Wa — Ya-1), (Yau — Ya)) 
is the Dirichlet distribution having d+ 1 parameters each equal to 1. 


PROOF. According to Problem 50 of Chapter 9, the distribution of the vector 
(Yi, ---, Ya) is the uniform distribution on 


{(y1,Y2,---, ya): O< yr < yo L< < ya <1}. 


Consider the linear transformation from this region to the region 
(10.5) {(21, 20,---,2a): 0 < z; for all i and z1 + z2 +---+24¢ <1} 


defined by 

z=% and z=y,-yi-1 for2<i<d. 
Its determinant is easily shown to equal 1. Any linear transformation with 
nonzero determinant takes a uniform distribution on any set to a uniform distri- 
bution on the image of that set, because it preserves ratios of volumes. Therefore, 


(10.6) [PAI OST) ater (Rg) 


is uniformly distributed on the region (10.5). As the discussion in the paragraph 
preceding Problem 28 indicates, the uniform distribution for the random vector 
(10.6) corresponds to the Dirichlet distribution with all parameters equal to 1 
for the random vector 


((¥1 = Yo), (¥%-N),..., (Ya — Yar), (You -Ya)). O 


* Problem 33. Let d = 2 in the notation of the preceding theorem. Calculate the 
distribution function of 
YiA(¥-VYi)A(—¥2). 


Problem 34. Let d = 2 in the notation of Theorem 6. Calculate the distribution 


function of 
Yi v (Yə - Y) v (l -Y). 


10.5. RANDOM SUMS IN VARIOUS SETTINGS 159 


Problem 35. Let d = 2 in the notation of Theorem 6. Calculate the probability 
that at one of the three random variables Y1, (Y2 — Yi), and 1 — Y> is greater than 


$ and at least one other of them is less than z. 


* Problem 36. In the notation of Theorem 6 find the distribution of Y4 — Yı. 


* Problem 37. Find the distribution function of the random area described in Prob- 
lem 52 of Chapter 9. 


10.5. | Random sums in various settings 


In Remark 1 we noted that much of the theory of sums of independent R¢-valued 
random variables carries over to the sums of independent random variables taking 
values in a measurable commutative group. Here is another example illustrating 
this fact. 


Example 3. The Galois field GF (2) has two members, 0 and 1. Addition 
and multiplication are defined as in the real numbers, with the exception that 
1+1=0. Thus, 0 is the identity for the binary operation of addition, and 1 is 
the identity for the binary operation of multiplication. Let G be the o-field of 
all four subsets of GF (2). Notice that GF(2) is a commutative group with the 
operation of addition. 

The distribution of a random variable from a probability space (Q, F, P) into 
(GF(2),G) is characterized by a number p € [0, 1], defined to be the probability 
that the random variable equals 1. For this setting, we can use the notation 


py * + * Py = P({w: N Xi(w) =p) 


where (X1,..., Xn) is an independent sequence of GF(2)-valued random vari- 
ables and p; = P({w: X;(w) = 1}), 1 < i < n. If the random variables X; 
are identically distributed with p; = p, we can use the notation p*”. We might 
imagine the following experiment. On day 0, an ‘on-off’ switch is placed in the 
‘off’ position. On each succeeding day, the position of the two-way switch is 
changed from its position of the previous day with probability p, independent 
of past history. On day n, p*” is the probability that the switch is in the ‘on’ 
position. 


Problem 38. This problem concerns the preceding example. Find a formula for 
pı * p2. Suppose that 0 < p < 1. Prove that p*” > 1/2 as n — oo. Hint: Obtain a 
recursive formula for p*”. 


Problem 39. Make appropriate adjustments in Example 3 to focus on the operation 
of multiplication in GF(2) instead of addition. 


160 10. SUMS OF INDEPENDENT RANDOM VARIABLES 


Remark 2. The existence of inverses, required in a group, is not used in the 
definition of convolution, although it is used in Proposition 3. Thus, one may 
speak of sums of independent random variables in structures having an operation 
that is commutative and associative and for which there is an identity. Such a 
structure is a commutative semigroup. One example of a semigroup that is not a 
group is R`. Another is found in Problem 39. For semigroups to be turned into 
measurable spaces it is required that the semigroup operation be measurable, but 
we will not explicitly treat measurability issues in the examples that we give. 


We conclude with some examples involving ‘sums’ of random sets. We first 
need some terminology and notation. For two compact convex sets of A and B 
contained in R? we let A V B denote the convex hull of AU B; that is, A V B is 
the intersection of all compact convex sets containing AU B. The next problem 
treats the only nonobvious aspect of showing that the set of compact convex sets 
with the operation V is a semigroup. 


* Problem 40. Prove that the operation V on convex compact subsets of R? is asso- 
ciative. 


Problem 41. Sketch a picture of 


{x € R°: zı = 0, |æ2| < 1} V {x € R’: 22 = 0, [z1] < 1}. 


Let A be a compact convex subset of R?. For vy € [0, 27), let 
h(y) = sup{z; cosy + zo siny: (z1, £2) € A}. 


The function h, illustrated in Figure 10.2, is called the support function of A. 


FIGURE 10.2. The support function h of a set A 


10.5. RANDOM SUMS IN VARIOUS SETTINGS 161 


Problem 42. Calculate the support function of an arbitrary one-point set in R?. 


* Problem 43. Calculate the support functions of the three compact convex sets 
involved in Problem 41. 


Problem 44. Let A and B be two convex compact subsets of R* having respective 
support functions ha and hg. Prove that the support function of A V B is the 
function ha V he. 


Problem 45. Prove that distinct compact convex subsets of R? have distinct sup- 
port functions. 


The collection of compact convex subsets of R? is easily shown to be a closed 
subcollection in the space of all compact subsets of R?. Therefore, it is a metric 
space with the Hausdorff metric, and as such it is a measurable space with the 
Borel o-field. In view of the preceding problem there is a one-to-one correspon- 
dence between the space of compact convex subsets of R? and the set of support 
functions. This correspondence can be used to define a o-field of subsets of the 
set of support functions. 


Problem 46. Let h ~~ h(y) be the function which assigns to each support function 
its value at y for some fixed y. Prove that this function is measurable, when the 
measurable sets of support functions are taken to be those described in the last 
sentence of the preceding paragraph. 


* Problem 47. Let (X1, X2, X3) be an independent triple of R?-valued random vari- 
ables, each of which is uniformly distributed on the unit disk {z: ||z|| < 1}. For 
each w in the underlying probability space, let p ~~ H,(w) denote the support 
function of {X1(w)} V {X2(w)} V {X3(w)}. For each ọ calculate the distribution of 
the R-valued random variable Hy. 


For two compact convex subsets A and B of R? let 
A+B=f{x+y:creA,ye B}. 


The set 4+ B, which is necessarily compact and convex, is called the Minkowski 
sum of A and B. As an illustration, Figure 10.3 shows how adding a small disc 
to a rectangular region rounds the corners of the rectangular region. 


* Problem 48. Prove that the Minkowski sum of two compact convex subsets of R? 
is compact and convex. 


Problem 49. Let A and B be two compact convex subsets of R? having respective 
support functions ha and hg. Prove that the support function of A+ B is ha +he. 


Problem 50. Fix a € (0,27). Calculate the support function of the compact 
convex set, the boundary of which is the square with vertices at (cosa,sina), 
(— sin a, cos a), (— cosa, — sina), (sina, — cosa). 


162 


10. SUMS OF INDEPENDENT RANDOM VARIABLES 


FIGURE 10.3. The Minkowski sum of a rectangular region and a disc 


Problem 51. By randomly choosing a in the preceding exercise according to the 
uniform distribution on [0, 27), a random compact convex set, and thus a random 
support function, is obtained. For each fixed y € [0, 27) calculate the distribution 
of the value of the random support function at g. 


Problem 52. Consider an independent pair of random square regions, each having 
the distribution described in the preceding exercise. Let 


w ~ [p ~ Hy(w)] 


denote the random support function of their Minkowski sum. For each y € [0, 27) 
calculate the mean and variance of the R-valued random variable Hy. 


CHAPTER 11 
Random Walk 


In this chapter, we will study certain sequences of random variables, known as 
‘random walks’. These are defined in terms of sums of independent identically 
distributed random variables. Important in the study of random walks (and 
of more general random sequences) are ‘filtrations’ and ‘stopping times’. A 
filtration is a sequence of o-fields representing the information available at various 
stages of an experiment. A stopping time is a Z -valued random variable whose 
value may be regarded as the time at which an experiment is to be terminated. 
In applications, such as gambling theory, important stopping times are the time 
at which a random walk reaches a certain goal and the time at which it returns 
to its original position. These will be treated in the latter part of the chapter 
for several special random walks. 


11.1. Random sequences 


In our analysis of a sequence of random variables, it will often be useful to think of 
the sequence itself as a single random object. In Chapter 2, we briefly discussed 
this point of view in the R-valued setting by stating that an infinite sequence of 
R-valued random variables is also a single R” -valued random variable. 


Problem 1. Let (¥,G) be a measurable space, and let Y = (¥o,¥1,...) be a se- 
quence of (Y, G)-valued functions, all defined on the same probability space. Prove 
that Y is a (W, G)°-valued random variable if and only if Y, is a (Y, G)-valued ran- 
dom variable for each n, and that in this case, o(Y) = a(Yo, Y1,...). Also prove 
that two (W,G)°°-valued random variables Y = (Yo, Yi,...) and Z = (Zo, Z1,...) 
have the same distribution if and only if the random vectors (Yo,...,Yn) and 
(Zo,..-,Zn) have the same distribution for all n. 


According to the preceding exercise, given a sequence Y = (Y0,¥j,...) of 
random variables, we may view Y, as the nt” component of the (¥,G)-valued 
random variable Y. One could use the notation (Y(w))n instead of Y,(w) in 
order to emphasize this point of view, although we will not use such notation 


164 11. RANDOM WALK 


even though it reflects the point of view that we often take. Thus, when we 
write Y;,(w), we may be thinking of first fixing n and then looking at the random 
variable Y,,, or we may have in mind first fixing w, and then looking at the nt” 
term in the sequence Y(w). The random object Y is called a random sequence. 
The coordinate index n is often called the (discrete) time parameter or, more 
briefly (discrete) time. 


Problem 2. Let Y be a random sequence of (Y, G)-valued random variables defined 
on a probability space (Q,F, P). Let K be the collection of all subsets of Z*. Prove 
that (n,w) ~> Y,(w) is a measurable function from (Zt, K) x (Q, F) to (V,G). 


The preceding exercise introduces yet another point of view, namely that a 
random sequence Y is a measurable function of (n,w) € Zt x Q. To emphasize 
the fact that this type of measurability refers simultaneously to n and w, we say 
that Y is jointly measurable. As can be seen from working the exercises, joint 
measurability is a straightforward consequence of the measurability of each of 
the individual random variables Y,, so that it is a general property of random 
sequences. In Part 6 we will encounter ‘random functions’, which arise when the 
time parameter set is Rt instead of Z+. In that setting joint measurability is 
not automatic, and some additional assumptions are needed. 

The following proposition asserts that evaluating a random sequence at a 
random (time) coordinate gives a random variable. It is an easy consequence of 
joint measurability. (We have already used this result implicitly; see Problem 51 
of Chapter 9.) 


Proposition 1. Let N be a Z*-valued random variable and let Y be a random 
sequence of (W,G)-valued random variables, all defined on a probability space 
(Q,F,P). Then 


Gras YN(w) (w) 


is a (W,G)-valued random variable. 


Problem 3. Use Problem 2 to prove the preceding proposition. 


If a sequence Y = (¥Yo,¥1,...) is independent, and if all of the components 
Y, have the same distribution, then we call Y an wd random sequence, or sim- 
ply an tid sequence. (The letters ‘iid’ stand for “independent and identically 
distributed”.) Such sequences are quite important in probability theory, and in 
particular, they constitute the basic ingredient in the definition of random walks. 


11.2. Definition and examples 


We introduce a famous type of random sequence. 


11.2. DEFINITION AND EXAMPLES 165 


Definition 2. Let X = (X1, X2,...) be an iid sequence of random variables 
each taking its values in RÊ or R’. The sequence S = (So, S1, S2,...), where 
So(w) = 0 for all w and 


(11.1) Sa S Api 
k=1 
is a random walk with steps X,,X2,.... The common distribution of the steps 


is the step distribution. 


We see that a random walk is the sequence of partial sums of an iid sequence. 
We will typically take the point of view that Sọ is the empty sum, so that its 
equaling 0 need not be mentioned explicitly. We may then use (11.1) to define 
Sn for all n > 0. 

The space R? or R” in which the various Sn take their values is called the 
state space of S, and, for instance, the phrase “S is a random walk in R?” 
indicates that the state space is R?. For random walks in R whose steps have 
integral coordinates it is natural to call Z? the state space, and similarly for R` 
and Z`. 

Let X = (X1, X2,...) be the sequence of steps of a random walk S with 
step distribution Q. The random sequence X has distribution Q%. Since S 
is the composition of a measurable function and X (can you prove this?), the 
distribution of S is determined by Q. Thus, we may speak of ‘the’ random 
walk with step distribution Q, even though there are many possible underlying 
probability spaces (Q, F, P). If, say, the state space is R, a canonical choice for 
Q is R°. Then we may take X to be the identity function on Q = R® and S a 
certain measurable function from R” to R”. 


Problem 4. Write down the measurable function to which the preceding sentence 
refers. 


Problem 5. Let S be a random walk with step sequence X. For integers 0 <i < j, 
define 


Sig = Mita t+ + X;. 


Thus, Sn = Son. Prove that for all positive integers k, the doubly indexed se- 
quences 


(Sij:0<i<j) and (Site jeer: 0<2< j) 


have the same distribution. (The random variables S; ; are called the increments 
of S, and the result just stated is often expressed by saying that S has stationary 
increments.) Hint: Express the first doubly indexed sequence as a certain function 
y of the step sequence X. The second doubly indexed sequence is yo Y, where 
Y = (Xx, Xn41,.--). Use the fact that X and Y have the same distribution. 


166 11. RANDOM WALK 


Problem 6. This is a continuation of the preceding problem. Show that for integers 
O<t < ji Liz <j2 < Lin< jn, 
the collection of increments 
E helt gh) 


is independent. (This property is often expressed by saying that S has independent 
increments.) 


Example 1. [Simple Random Walks in Z*| A simple random walk in Zisa 
random walk with state space Z? (or RÊ) whose step distribution Q is supported 
by the set N = {+e;:1=1,...,d}, where e;, i = 1,...,d, are the standard basis 
vectors in R¢. A simple random walk on Z? is also called a nearest neighbor 
random walk on Z%, because successive states S,(w) and Sp41(w) are always 
one unit apart (‘nearest neighbors’). The first eleven states of a simple random 
walk in Z? are illustrated in Figure 11.1. If Q is the distribution which assigns 
equal probability to each of the 2d points in M, then the random walk is called 
a simple symmetric random walk on Z. In particular, the simple symmetric 
random walk on Z makes steps of size 1 in either the positive or the negative 
direction, each with probability 1/2. 


FIGURE 11.1. A simple random walk in Z? 


Problem 7. Describe the relation between Example 2 of Chapter 2 and the simple 
symmetric random walk in Z. 


Problem 8. Let S be a simple random walk in Z, with step distribution Q. Show 
that for each n, the distribution of S» is of the same type as the binomial distri- 
bution with parameters n and p, where p = Q({1}). 


11.2. DEFINITION AND EXAMPLES 167 


Remark 1. In view of Remark 2 of Chapter 10, random walks can be defined 
in arbitrary commutative semigroups. Although our main interest is in the 
two state spaces Rf and Rt, most assertions in this chapter and later, such as 
Theorem 12, are for state spaces that are arbitrary commutative semigroups. 
The semigroups R` and Z are special because they arise naturally as state 
spaces in the study of random walks in R. 

There are some general results for random walks that require the state space to 
be a group, not just a semigroup (for example, see Problem 15). For simplicity, 
we will state such results in the R?-setting only, even though they hold for 
arbitrary commutative groups. 


The next example treats a random walk in a group different from R, the 
group (—7, 7] of rotations that was introduced in Problem 11 of Chapter 10. 


Example 2. In the set (—7,7] with addition defined as in Problem 11 of 
Chapter 10 consider the random walk S with step distribution Q given by 


Q({$}) =Q{-F}) = 3. 


Let us calculate the distribution of S» for each n. It is clear from symmetry that, 
for each n, there exist nonnegative numbers py, gn, Tn such that pyp+2qnt+2rn = 1 
and 


Pn = P({w: Sn(w) = Of); 
qn = P({w: Snl) = F}) = P({w: S) = -F ); 
ta] P Geshe) = SVS PG: Siw)= —}). 


Clearly, pọ = 1 and gg = ro = 0. For n > 1 the distribution of S» is the 
convolution of the distributions of XY, and S,_1, where X,, denotes the n*? step. 
Thus, we have 


rn = P({w: Xn(w) = OF) P({w: Sa-l) = FH 
— Xp(w) = E) Pw: Sn-1(w) = E) 
P({w: Xn(w) = #}) P({w: Sn- (w) = 0}) 
o Xn(w) = —F}) P({w: Saleh 
+ P({w: X,(w) = — 28 IPU Saroe ~}) 
= 0+ $4n-1 +0404 rn. 


Similar calculations give expressions for pn and qn. The result, for n > 1, is, in 
matrix notation, 


Pn 0O 1 0 Pn-1 O 1 0 1 
Qn | = 5 0 5 Qn-1 | = t 0 5 0 
Tn 0 5 $/ Wn 0 § $/ \0 


168 11. RANDOM WALK 


We compute the nt power of the matrix by diagonalizing: 


n 


01 0 
203 
0 5 3 
_ {4 4 4 
Sea te a Sew 
A alsa Says 
1 0 0 ya 2 2 
0 = 9 2 -1+V¥5 -1-vV5 
0 oO =% 2 -1-V5 -1+v5 
The appropriate matrix arithmetic gives 
1 2f-14+V5\" 2/-1-v5\" 
Pee re elit a 
5 5 4 5 4 
ty, eS aN te. 0 Tea ay es 
Qn = = + =| — + = ( ———— 
5.5 4 5 4 
sall 145 ae a inf n-l 
” 5 10 4 10 4 l 


It is interesting to note that Pn, qn,”n all approach $ as n > oo. 


Problem 9. Use the result of the preceding example to obtain some formulas for 
certain sums of binomial coefficients. 


Problem 10. Modify Example 2 and Problem 9 by defining Q by 
ada p= Fs 
Problem 11. Modify Example 2 and Problem 9 by defining Q by 


QUID = Q({-F)) =z. 


Find a qualitative difference between this random walk and the random walks 
described in Example 2 and Problem 10. 


* Problem 12. In the set (—z,7] with addition defined as in Problem 11 of Chap- 
ter 10 consider the random walk S with step distribution Q given by 


Q({0}) = Q({7}) = 3 
and for all Borel sets B C (—7, x] \ {0,7}, 
Q(B) = g(B), 


where \ denotes Lebesgue measure. Calculate the distribution of Sn for n > 1. 


11.2. DEFINITION AND EXAMPLES 169 


* Problem 13. Let p,g,r denote positive numbers whose sum equals 1. Let Q be 
the distribution on Z” given by Q({1}) = p, Q({2}) = q and Q({oo}) = r. Con- 
sider a random walk S in Z“ (or R ) whose step distribution equals Q, and let 
N(w) = inf{n: S,(w) = co}. Prove that N < œ a.s., so that Sn-1ı is a.s.-defined. 
Calculate the distribution of the random vector (N — 1, Sn-1ı) and the expected 
value of Sn-1. 


* Problem 14. Let Q be a distribution on R” for which 0 < Q({oo}) < 1. Let S be 
the random walk with step distribution Q and define N as in the preceding exercise. 
Follow the instructions in the last two sentences of that exercise, except that here 
one cannot expect such an explicit formula for the distribution of (N — 1, Sn-1) 
as in that very specific situation; convolutions may appear in the answer. 


Example 3. Consider a random walk S in R` having exponential step dis- 
tribution with mean 1. By Problem 9 of Chapter 10, for k = 1,2,..., Sk is 
gamma distributed with parameters 1, k. 

Fix 8 € (0,00) and let 


N (w) = f{n: 0 < Sp(w) < 8}. 


Since each step is a.s. positive, N a.s. equals the number of steps taken by the 
random walk before it reaches the interval (8, 0c). 
For k € Zt, 


P({w: N(w) = k}) = P({w: N(w) > k}) — P({w: Nw) > k + 1) 
= P({w: Sk (w) < B}) — P({w: Sk+1 lw) < B}) 


B /pk-le-« kent 
z i (a y chan) da 
0 [(k) [T(k + 1) 
gke-t |? Bre? 
i ~ Rl 


T(k+1) 


0 
Therefore, the random variable N is Poisson distributed with mean 8. This fact 
will play a role in the study of ‘Poisson point processes’ in Chapter 29. 


If a random sequence has the same distribution as a random walk, then it 
deserves the name ‘random walk’, even if its description or construction does 
not explicitly involve iid steps. In the R¢-setting, the steps can be obtained from 
the original sequence by subtraction, so a random sequence (So = 0, S1, 5S2,...) 
of R¢-valued random variables has the same distribution as a random walk if and 
only if the random sequence ((Sp — Sn-1): n = 1,2,...) is iid. Since subtraction 


is not universally defined in R”, the story there is somewhat subtle. 


Proposition 3. Let T = (To = 0,Tı,T2,...) be a sequence of R` -valued 
random variables. Then T has the same distribution as a random walk in R° if 


170 11. RANDOM WALK 


and only if 


n 


P({w: Ty (w) — Tk- (w ) € Bk for1<k<n}) = IP ({w: Ty (w ) € Bg}) 


for every choice of the positive integer n and Borel subsets By, of R, with the 
understanding that the undefined expression œ — œ is not a member of R™. In 
this case, the step distribution of the random walk is the distribution of T,. 


PROOF. The ‘only if? assertion is obvious. For the ‘if’ assertion, let (Q, F, P,) 
denote the probability space on which T is defined. Let Y = (Yj: j > 1) be an 
iid sequence defined on another probability space (Y, G, P>), with the common 
distribution of the Y; equal to the distribution of T}. Set 


(O, H, P) = (0,7, Pi) x (U,G, Po). 
We may regard each Y; as defined on 9 as follows: 
Yw, #)) = Y). 


Similarly we may regard T as defined on O via its values at first coordinates of 
members of ©. We have thus arranged for Y to be independent of T. 
For k > 1 define random variables Zą on (©, H, P) via 


7 (6) _ Ty (0) = Tk- (0) if Tk—1ı(0) < œ 
RO if Ty-1(0) = co. 


Clearly T,(@) = X}; Zn (8) for all n and @, and the distributions of Z1, Yı, and 
Tı are identical. Thus we only need prove that (Z,: k > 1) is an iid sequence. 

In view of Proposition 5 of Chapter 9 we only need prove that Z,, is inde- 
pendent of (Z),...,Zn—1) and has the same distribution as Z, for n = 2,3,.... 
Hence we may finish the proof by showing 


1.2) P({6@: Zn(@) € Bn, (Z1(9),---,Zn-19)) € Cn-1}) 

=P (02270) =B, HYP (16: (Z 1(8),.--, Zn-1(8)) € Cn-1}) 
for every choices of n > 2, Borel subset B, of R”, and Borel subset Cy,_1 of 
(RYT 


For fixed n and Bn the collection of Cn—1 for which (11.2) holds is a Sierpinski 
class, by additivity of probability measures and continuity of measure. Thus, we 
only need consider sets Cn—ı of the form 


Cn-1 = Bı x By x---x Bn-1, 


where each B, is either the one-point set {oo} or a Borel subset of R+. In case 
Bk C R* for k < n, (11.2) is an immediate consequence of the hypothesis in 
the theorem, so we assume that there exists q < n such that Bı = {oo} but 
B, C Rt for k < q. If q < n, then Zn can be replaced by Yn on the left side of 
(11.2), and then (11.2) follows from the fact that Y, has the same distribution 


11.3. FILTRATIONS AND STOPPING TIMES 171 


as Z, and is independent of (T,Y1,..., Yn—1). Therefore we only need consider 
q =n. In this case the left side of (11.2) equals 


P({@: T: (0) — Tk—1(0) € B, fork < n}) 
— P({0: Ty(0) — Tk-1 (0) € Br for k < n and T,(9) — Ta-1 (8) < œ}), 


which by the hypothesis of the theorem, equals 


P({0: Tk(0) — Tk-1 (8) € By for k < n}) [1 = P({T, (9) — Tn-1(0) < œ })] 
= P({0: T,(0) — Tk-1(0) € Bp for k < n}) P({0: Ta(0) — Tn-1(9) = œ}) , 


as desired. [O] 


11.3. Filtrations and stopping times 


One might observe a random sequence S over time, with Sn being the observation 
made at time n. The o-field o (So, S1, .-., Sn) represents the information accu- 
mulated up to and including time n by making such observations. The following 
definition introduces a sequence of o-fields designed to represent information 
accumulated over time. 


Definition 4. Let (Q, F) be a measurable space. A sequence (Fo, Fi, Fo,...) 
of sub-o-fields of F is a filtration in (Q, F) if it is increasing, that is, if 


Fo ST veo C epn 


A sequence Y = (Yo, Y1,...) of measurable functions defined on (Q, F) is said 
to be adapted to a filtration (Fn: n > 0) if, for each n, Yn is measurable with 
respect to Fn. Corresponding to a filtration (Fn: n > 0), we use the notation 


FS Old N E 


The minimal filtration (Fo, F,,...) of a random sequence Y = (¥o,¥%,...) 
is given by Fn = o(Yo,..-,¥n). The random sequence Y is clearly adapted 
to its minimal filtration. The minimal filtration is the one that is implicit in 
any discussion for which a filtration is required and none has been explicitly 
mentioned. 


Problem 15. Let S be a random walk in R? and denote the corresponding sequence 
of steps by X. Prove that 


ea, Gee tee, oo Eol S ey eee ese | 


for each positive integer n. Is the same conclusion true for n = Q if, in that case, 
the o-field on the left is interpreted to be the smallest of all o-fields? Give an 
example to show that o(X1,..., Xn) and o(So, S1,..., Sn) need not be equal for 
random walks in R`. 


172 11. RANDOM WALK 


Problem 16. Prove that if a random sequence Y is adapted to a filtration (Fn: n > 
0), then o(Y) C Fæ, with equality in case (Fn: n > 0) is the minimal filtration of 
Y. 


The next definition introduces a class of random variables that is fundamental 
to the study of random sequences. 


Definition 5. Let (Q, F) be a measurable space, and let (Fn: n > 0) be a 
filtration of sub-a-fields of F. A Z' -valued measurable function N defined on 
(Q,F) is a stopping time with respect to this filtration if 


{w: N(w) <n} E Fy 


for all n € ZL. 


* Problem 17. Show that N is a stopping time with respect to a filtration (Fn: n > 
0) if and only if {w: N(w) = n} E€ Fn for all n € Ze 


It is important to develop intuition for filtrations and stopping times. A 
filtration represents the information obtained by observing an experiment up to 
time n. One may think of a stopping time N as the time at which the observations 
of the experiment are to be stopped. The definition of stopping time requires that 
the decision to stop observing at a certain time n be based on the information 
available up to time n. Of course, the definition does not require that there 
actually be an observer who stops observing at time n; the term ‘stopping time’ 
is used regardless of whether such an interpretation is intended. 


Problem 18. In this problem the infimum of the empty set is defined to be +00 as 
usual, and the supremum of the empty set is defined, somewhat unconventionally, 
to be 0. Let Y be a sequence of R-valued random variables. Show that for all Borel 
sets B CR, 


N(w) = inf{n > 0: Yr(w) € B} 
is a stopping time with respect to the minimal filtration of Y, but that 
M(w) = sup{n: Y,(w) € B} 


is in general not, even if it is assumed that, for every w, {n: Yn(w) € B} #0. Hint: 
One approach for the first part is to use Problem 17. 


Problem 19. Let S and N be as in Example 3. Is N a stopping time with respect 
to the minimal filtration of S? 


The random variable N in the preceding exercise is called the hitting time of 
the set B. Many (but certainly not all) stopping times are hitting times. 


11.3. FILTRATIONS AND STOPPING TIMES 173 


Proposition 6. Let Ni, N2,... be stopping times with respect to a filtration. 
Then Ni A No, Ni V No, Ni + No, limsup,_,., Ne, and liminfz-,.. Nx are also 
stopping times with respect to that filtration, as is any constant n € Z, regarded 
as a constant random variable. 


Problem 20. Prove the preceding proposition. Also provide an example of two 
stopping times Nı < N2 with respect to some filtration, such that N2 — N; is not 
a stopping time with respect to that filtration. 


When one observes a random sequence Y up to a random time N, one obtains 
a certain amount of information. If Y is adapted to a filtration (Fn: n > 0) and 
N is a stopping time with respect to that filtration, then, according to the 
following definition, this information is contained in a o-field that we call Fy. 


Definition and Proposition 7. Let N be a stopping time with respect to 
a filtration (Fa: n > 0). The collection of events A such that 


An {w: N(w) <n} E€ Fn 


for all n € Z is a ø-field. It is denoted by Fy. 


Problem 21. Prove the preceding proposition. 


Problem 22. What is wrong with defining 
Fn =0(Fn:n < N)? 


Proposition 8. A stopping time N is measurable with respect to the o-field 
Fn. 


Problem 23. Prove the preceding proposition. 


* Problem 24. Prove that if M < N are two stopping times with respect to a com- 
mon filtration (Fn: n > 0), then Fm C Fn. 


We have already seen in Proposition 1 that if Y is a sequence of random 
variables and N is a Zt-valued random variable, then Yy is a random vari- 
able. In the case of stopping times, we can say more, provided we introduce an 
appropriate convention for Yoo. 


Proposition 9. Let Y be a random sequence adapted to a filtration (Fn: n > 
0), and let N be a stopping time with respect to the same filtration. Let Yoo 
denote any random variable that is measurable with respect to Fa. Then Yn is 
measurable with respect to Fn. 


174 11. RANDOM WALK 


Problem 25. Prove the preceding proposition. Hint: Break up events involving Yx 
into disjoint pieces, according to the value of N. 


11.4. Stopping times and random walks 


The main reason that stopping times are useful in the study of random walks is 
contained in the following result, which says that the iid property persists after 
a stopping time. 


Proposition 10. Let X = (Xn: n > 1) be an tid sequence of (V,G)-valued 
random variables, where (V,G) is a measurable space. Let (Fn: n > 1) be the 
corresponding minimal filtration, and let N be an a.s. finite stopping time with 
respect to this filtration. For n > 1, define 


Yn(w) = XN (w)+n(W) : 


Then the sequence Y = (Yı, Y2,...) has the same distribution as X, and is 
independent of Fn. 


PROOF. It follows from Proposition 1 that Y is a (V,G)™-valued random 
sequence. Fix A € G% and B € Fy. Then 


P(BN {w: Y(w) € A}) 
oO 
= Y P(BN {w: N(w) =m} n fw: X™(w) € A}, 
m=0 
where X'™) is the random sequence (Xm+1;%m+2;---). By Problem 17, 
{w: N(w) =m} E Fm, 
and, by the definition of Fy, 
Bo {w: N(w) <m} E€ Fm. 


Therefore, the intersection of these two events, BA {w: N(w) = m}, is a member 
of Fm. The random sequence X‘™) is independent of Fm and has the same 
distribution as X, so 

P(BN {w: N(w) = m} N {w: X° (w) € Ad) 
P(BN {w: N(w) = m}) P({w: X™ (w) € A} 
P(BN {w: N(w) = m}) P({w: X(w) € A}). 


Now sum on m to obtain 
P(BN {w: Y(w) € A}) = P(B) P({w: X(w) € A}) = P(B)Q™(A), 


where Q is the common distribution of the X;’s. By letting B = 2, we find that 
Y has the same distribution as X, namely Q®. It then follows that B and Y 


11.5. A HITTING-TIME EXAMPLE 175 


are independent. Since B is an arbitrary member of Fyn, Y is independent of 
fn. O 


By Proposition 8 and Proposition 9, N and Xy are measurable with respect 
to Fy, and so, by the preceding proposition, the random vector (N, Xn) is 
independent of Y. 


Corollary 11. Let (Fn: n > 0 be the minimal filtration of a random walk S, 
and let N be an a.s. finite stopping time with respect to this filtration. Denote 
by X the step sequence of S. Then 


(X41, [Xn + Xn], [Xna1 + Xna + X43], ---) 


is a random walk that has the same distribution as S and is independent of Fn. 


The next problem, which should be compared with Problem 6 where it was 
shown that a random walk has independent increments, transforms the preceding 
corollary into statement about the increments of a random walk. 


Problem 26. Let Ni < No <... be a sequence of a.s. finite stopping times with 
respect to the minimal filtration of a random walk S in Rt. Prove that the random 
variables 


Gi Oi Se Se Ss kas 


are independent. 


11.5. A hitting-time example 


This section treats hitting times of simple random walks in Z. 


Example 4. Consider a simple random walk S in Z and set 


= P({w: Si(w) = 1}) =1—-— P({w: Si (w) = —-1}). 


To avoid trivialities we assume that p € (0,1). Fix a positive integer c and for 
b=0,1,...,c let Np denote the hitting time of the two-point set {b—c,b}. For 
c = 6 the random variable N> is illustrated in Figure 11.2. We set the goal of 
calculating the probability generating function p of No. 

Since No = Ne. = 0, po(s) = pels) = 1 for 0 < s < 1. Since N, > O for 
0<b<c, pp(0) =0. 

Denote the sequence of steps by X = (Xj, X2,...), and let S be the random 
walk obtained by deleting the first step of S; thus, So = eae), S$. = = X2, 52 = = 
Xo+Xz,.... Let M equal the hitting time of the set {b — c,b} by S. The 
random walks S and S are identically distributed, as are the hitting times N, 
and NG: 


176 11. RANDOM WALK 


FIGURE 11.2. The hitting time of {—4, 2} 


For0<b<candn>Q, 


P({w: No(w) = n}) 
= P({w: S(w) = 1, Np_1(w) =n —- 1}) 
+ P({w: Si(w) = —1, Nopi (w) =n —- 1}) 
= P({w: S1(w) = 1}) P({w: No- (w) =n- 1}) 
+ P({w: Siw) = -1}) P({w: Nyl) =n — 1}) 
= pP({w: Np_1(w) =n —1})+ (1 — p)P({w: Nilo) =n —-1}). 


(11.3) 


We have used the fact that Sı and S are independent. Now multiply by s”,0 < 
s < 1l, sum from n = 1 to ov, and use the fact that P({w: Ne(w) = 0}) = 0 to 
obtain 


(11.4) po(s) = pspy—i(s) + (1 — p)spo4i(s)- 


For each fixed s this is a linear homogeneous second-order difference equation. 
As such, its solution set consists of all linear combinations of any two linearly in- 
dependent solutions. We look for solutions of the form p(s) = [A(s)]°. Inserting 
this expression into (11.4) we obtain a quadratic equation for A(s): 


(1 — p)s[A(s)]}° — A(s) + ps = 0. 


11.5. A HITTING-TIME EXAMPLE 177 


We solve for the two solutions and conclude that, for appropriate a(s) and {(s), 


1+ ./1 —4p(1 — p)s? \" 1—,/1— 4p(1 — p)s?\° 

(s) 1+ V1 — 4p(1 — p)s* + B(s) 1— V1 — 4p(1 — p)s* l 
2(1 — p)s 2(1 — p)s 

The conditions po(s) = pe(s) = 1 determine the functions a and 8. The result, 

after some algebraic simplification, is: 


=O) as eee 
(1+ /1 - 4p(1 — p)s?)° — (1 - v1- 4p(1 — p)s?)° 
(1+ \/1 — 4p(d — pys?)’ — (1- \/1 — 4p( — ps?) 
(1+ ./1 — 4p(1 — p)s?)° — (1— V1- 4p(1 — p)s?)" 
Even though we have assumed s > 0 in the calculation, the preceding formula is 
also correct for s = 0, by continuity. 
The expected value of the hitting time can be calculated by differentiating 


pp. However, we find it here by multiplying (11.3) by n and summing from 1 
to oo. The result is the following inhomogeneous linear second-order difference 


p(s) = 


+ (2(1—p)s)°~° 


equation: 
(11.5) E(N») = pE(Np-1)+ (1 — p)E(Ns41ı) +1, O0<b<e. 


We begin by solving this difference equation under the assumption that all quan- 
tities are finite, first for the case p # 4. The general solution of the homogeneous 
equation 


E(No) = pE(Ne-1) + (1 — p)E(No41) 
can be obtained by the same method that was used for (11.4). It is 


b 
p 
+5(-2-) l 
Yy eee 


The function 6/(2p — 1) is easily seen to be a particular solution of (11.5), so 
that the general solution is 


By b 
ô | — . 
us ie i 2p- 1 
The conditions E(No) = E(.N,) = 0 determine y and 6. The result is 
Ji = (3/0 =p) =e =(p/(t =)" 
(2p — 1)[1 — (p/(1 — p))°] 
When p = 1/2, the general solution to (11.5) is of the form y + 6b — b?. Solving 
for y and 6 as above gives 


E(N,) = 


E(Ns) = b(c = b) ‘ 
To confirm that our formulas are really correct, that is, that both sides of (11.5) 
are finite, we note first that, for each b, Ny, < Mc, where 


M(w) = inf{n: w € An} 


178 11. RANDOM WALK 


and 
An = {w: Xne(w) = Xne-1(w) ie altar Xne—(c—1) (w) = 1} $ 
The events An are independent and the probability of each equals p°. Therefore, 
P({w: M(w) > m}) = (1 — p°)”. 


Thus, M is geometrically distributed and, hence, has finite mean. So, each N, 
also has finite mean. 


Problem 27. Show that p,(1—) = 1, where p» is as in the last example. What 
conclusion can you draw? Does that conclusion seem intuitively reasonable? Also, 
calculate the expected value of the hitting time of the set {—(c— b), b} by using the 
derivative of pẹ. Hint: You can simplify your computations by noting that when 
s = 1, the expression under the radical is a perfect square. 


* Problem 28. For the random walks of Example 4, let o» be the probability gen- 
erating function of the hitting time of {b}. Prove that o, = (a1)? for b > 0 and 
obtain a similar formula in case 6 < 0. By an argument similar to that used in 
Example 4, find another relation between gı and o2 and then use the two relations 
between gı and a2 to evaluate a1. Alternatively, evaluate o1 by letting c go to oo 
in Example 4. Calculate the distribution of the hitting time of {1} (illustrated in 
Figure 11.3); in particular evaluate the probability that the random walk never hits 
{1}. Finally, calculate the distribution of the global maximum (or supremum in 
case the global maximum does not exist) of the random walk. Hint: After obtaining 
a formula for o; rationalize the denominator if necessary. Also, see Problem 33 in 
Chapter 5. 


The following exercise shows how a hitting time may be used to simplify a 
formula that seems difficult to treat by direct algebraic techniques. 


Problem 29. Evaluate the following sum for 0 < p< 1: 
= n\ k n-k 
3 (z) a-p. 


Hint: Consider a random walk in Z whose step distribution is supported by {0,1}. 


11.6. Returns to 0 


The hitting time of {0} for a random walk (So, S1,...) adapted to a filtration 
(Fn: n > 0) equals 0 (with probability 1). Of more interest is the first return 
time to 0, 


Tı (w) = inf{n > 0: S,(w) = 0}, 


11.6. RETURNS TO 0 179 


N > 36 
2 (possibly oo if p < $) 


FIGURE 11.3. The hitting time of {1} 
with, as is usual, inf @ defined to equal oo. Since 


{w: Ti(w) <n} = LJ {w: Smu) =0} € Fn, 


m=1 


Tı is a stopping time. 


Example 5. Let us calculate the distribution of the first return time T; to 0 
for the simple random walk S in Z with step distribution Q given by 


Q({1}) =p=1-Q({-]}). 


We take 0< p<. 

As in Example 4 we let S denote the random walk obtained by using the 
steps of S beginning with the second step. Thus, S is independent of Sı and, 
for n = 0,1,2,..., S, = Sasi — 51. Let M, denote the hitting time of {b} by 
the process 5. Then, for n > 0, 


P({w: T,(w) = n}) 
= P({w: Sw) = 1, M_1(w) =n — 1}) 
+ P({w: S\(w) = —1, Mw) = n-1}) 
= P({w: S,(w) = 1}) P({w: M_1(w) =n —1}) 
+ P({w: Siw) = -1}) Pw: Mw) =n — 1}) 
= pP({w: Milo) =n—-1}) + (1—p) P({w: Mw) =n- 1). 


180 11. RANDOM WALK 


Using the formula obtained for the distribution of M, in Problem 28 and a similar 
formula for Mı, we obtain, for positive even n, 


P({w: Ti(w) = n}) 
=% [p -pP (n-2)! 


=P > a_i a y G 
= n/2 (n — 2)! 
= 2[p(1 — p)” GE- 
2 n—2 nia. 
= arya (aw [p(l — p)]"/?; 


and 
P({w: Ti (w) = 00}) = |2p— 1]. 


We notice that: (i) twice the Catalan numbers (see Problem 33 of Chapter 5) 
appear as coefficients of [p(1 — p)]"/* and (ii) Ti < œ a.s. if and only if p = L, 
For the case p = L, the distribution function of Tı is shown via the solid graph 
in Figure 11.4, with the jumps being filled in as a visual aid. (The dashed graph 


is related to the Glivenko-Cantelli Theorem described in the next chapter.) 


* Problem 30. For T, defined as in the preceding example show that E(T1) = ov, 
even if p = 1/2. For the case p = 1/2, decide which moments of T, are finite. 


Let T, be the first return time to 0 of a random walk S. For 7 > 1 recursively 
define the j return time to 0 of S by 


To) =inf{n > TGs) Saw) = 0} 


where, as usual, inf Ø = oo. The proof that each T} is a stopping time with 
respect to the minimal filtration of S is similar to that for the first return time 
to 0. 


Theorem 12. For j = 1,2,..., let T; denote the jt return time of some 
random walk to 0. Let R be the distribution of Tı. Then the distribution of 
(0,7 1,72,...) is that of a random walk in Z with step distribution R. 


Problem 31. Prove the preceding theorem. Hint: Use Proposition 3. Show that 
one may take each Bx in that theorem to be a one-point set in the current situation. 


Corollary 13. Let Q denote the step distribution of a random walk S, and 
let 


Vw) = inf{j: Tj(w) = œ} = Hn > 0: Sp(w) = 0}. 


11.6. RETURNS TO 0 181 


FIGURE 11.4. Distribution function G of first return time to 0, 
and empirical distribution function 


Then either 


CO 


Y Q*"({0}) = 00, 


n=0 
in which case V = œ a.s., or 


OO 


5 Q°"({0}) < 00, 


n=0 


in which case the distribution of V is of geometric type, supported by {1,2,...}, 
and 
1 


2 n= O*"({0}) 


(The possibility that P({w: V (w) = 1}) = 1 is not excluded by the phrase “of 
geometric type”.) 


(11.6) P({w: V(w)=1})= P({w: T (w) = co}) = 


* Problem 32. Prove the preceding corollary. Hint: Once it is known that V has 
geometric type, a relation between E(V) and P({w: V(w) = 1}) can be obtained. 


182 11. RANDOM WALK 


The random variable V in the preceding result counts the number of times 
(including the time n = 0) at which the random walk is at 0. By Problem 5 of 
Chapter 6, if we define A, = {w: S,(w) = 0}, then V(w) = o if and only if 
w € limsup, An. Thus, part of Corollary 13 may be rephrased in a way that is 
reminiscent of the Borel and Borel-Cantelli Lemmas: 


\ if 7°) P(An) = 00 


11.7 P(lim sup An) = 
l ) l pan) 0 if Doo P(An)< œ. 


NCO 


When P(lim sup,, An) = 1, the random walk S is called recurrent. Otherwise, S 
is called transient. Note that the Borel Lemma can be used to get part of (11.7), 
but that the Borel-Cantelli Lemma cannot be used to get the other part, since 
typically the events A, are neither uncorrelated nor negatively correlated. 


Problem 33. Apply Corollary 13 to the random walk of Problem 12. 

Problem 34. Let S be simple symmetric random walk in Z. Show that 
dim VanP({w: San (w) = 0}) =1. 

Conclude that S' is recurrent. Hint: Use the Stirling Formula. 


Problem 35. Let S® 4 =1,...,d, be independent simple symmetric random walks 
in Z, and let S be the sequence of Z?-valued random variables defined by 


SHS no Su: SOD iota 


Show that S$ is a random walk on Z and describe its step distribution. Use 
Problem 34 to show that S is recurrent if d = 2 and transient if d > 3. 


Problem 36. Show that simple symmetric random walk in Z? is recurrent. Hint: 
Find a linear transformation L from R° onto R? such that S has the same distribu- 
tion as the sequence (L(So), L(S,), L(S2), ...), where S is the random walk from 
the preceding exercise with d = 2. 


Problem 37. Let S be simple symmetric random walk in Z°. Show that 


Di) Rae n! a 
P({w: Sen(w) = 0}) = p- D ee 
j=0 k=0 


for n = 0,1,2,.... Conclude that S is transient. Hint: Use the fact that for 
nonnegative a1,...,@m, 


m m 
2 
) aş < (max ag) ) ae. 
1<f<m 
f=1 


f=1 


11.7. RANDOM WALKS IN VARIOUS SETTINGS 183 


Problem 38. Show that simple symmetric random walk in Z? is transient for d > 
3. (This problem can be done in a straightforward manner by generalizing the 
calculation done in the preceding problem. But there is an interesting alternative 
method: Project the d-dimensional random walk onto Z°, and then use the result 
of the preceding problem to show that this projection is transient. Some care is 
required to carry out this second method rigorously.) 


In Chapters 12, 13, and 25 we will develop further tools for analyzing the 
returns to 0 of random walks. In particular, we will prove that if S is a random 
walk on Z such that E(Sı) exists, then the probability that S returns to 0 is less 
than 1 if E(S,) Æ 0 (Chapter 12) and equal to 1 if E(.S,) = 0 (Chapter 13). 


11.7. + Random walks in various settings 


As indicated earlier, one can define random walks in semigroups other than R?, 
De. R’ and Z'. Some random walks in various semigroups are described in the 
following problems. 


Problem 39. Consider the collection W of all subsets of the finite set {1,...,m}. 
Thus W has 2” members. Prove that (V,G) is a commutative group under the 


symmetric difference operation (Yy, x) ~ wAy. In particular, identify the identity 
in Ų as well as the inverse of each member of the group. 


* Problem 40. Let S denote the random walk in the group W of the preceding prob- 
lem whose step distribution Q assigns probability 1/m to each singleton. Calculate 


P({w: Salu) = {out} Pe 


* Problem 41. Consider the random walk of the preceding problem for the case 
m = 2. Calculate the distribution of S, for each n. Do the same for the case 
m= 3. 


* Problem 42. Consider the semigroup W of all 2” subsets of {1,..., m} under the 
operation of intersection. Let S denote the random walk having the step distribu- 
tion that assigns equal probability to each member of W. For j = 1,...,m, let N; 
be the hitting time of the collection of sets not containing 7. Prove that the random 
variables N;,1 < j < m, are independent and geometrically distributed. Calculate 
the distribution of the hitting time of the one-point set {0}. For 1 < k < m finda 
formula for the distribution of the hitting time of the one-point set {{1,2,...,k}}. 
Calculate explicitly the probability that this hitting time equals oo for the case 
k=m-—1l1. 


Problem 43. Let WY be the semigroup of all infinite sequences of 1’s and 0’s under 
the operation of term-by-term multiplication. Consider the random walk whose 
step distribution assigns probability 2~* to the singleton tales Greece aa Ie areas 2 oe 
where the only 0 is in the kt” position. For 2 = 1,2,3, calculate the distribution of 
the hitting time of the set {W: yı =--- = yi = 0}. 


184 


11. RANDOM WALK 


Problem 44. Fix a positive integer m and consider the collection K of all size-two 
subsets of {1,...,m}. Let & denote the family of all subcollections of K. How 
many members does Y have? Notice that a member y of Y can be interpreted as a 
graph with m vertices: the graph corresponding to % contains an edge connecting 
the vertices i and j if and only the pair {i,j} is a member of y. Regard WV as a 
semigroup under the operation of union. Consider the random walk whose step 
distribution assigns probability mat to each singleton of Y. For m = 2,3, 4, 
calculate the distribution of the hitting time of the collection of connected graphs. 
For general m calculate the distribution of the hitting time of the one-point set 
{K}. Hint: Use inclusion-exclusion for the last part. 


Problem 45. Consider a random walk S in a finite commutative group whose step 
distribution assigns equal probability to each member of the group. Calculate the 
distribution of each Sn, and of the sequence S. Calculate the distribution of the 
first return time to 0. 


Problem 46. Show that any random walk in a finite commutative group is recur- 
rent. 


CHAPTER 12 
Theorems of A.S. Convergence 


In this chapter we study the convergence of certain sequences defined in terms 
of sums of independent random variables. We will chiefly be interested in al- 
most sure convergence, but it will be useful to also consider another weaker type 
of convergence, called ‘convergence in probability’. The two major results con- 
cerning almost sure convergence are the Strong Law of Large Numbers and the 
Kolmogorov Three-Series Theorem. Other results included in this chapter are 
three important tools: the Kolmogorov 0-1 Law, the Hewitt-Savage 0-1 Law, 
and an important inequality, known as the Etemadi Lemma. As an application 
of the ideas contained in the proof of the Strong Law of Large Numbers, we also 
determine the asymptotic behavior of the size of the image of a random walk. 


12.1. Convergence in probability 


The following definition gives a name to a type of convergence that we have 
already encountered in the Weak Law of Large Numbers. 


Definition 1. Let Y;,n = 1,2,..., and Y be R-valued random variables 
defined on a common probability space (Q, F, P). For each £e > 0, let 


Af, ={w: |Y(w)| < œ and |Y (w) — Yn(w)| > €}, 
By ={w: Y(w) = œ and Y,(w) < 1/e}, 
Cy, ={w: Y(w) = -œ and Y,(w) > —1/e}. 


Then Y, — Y in probability as n — oo if, for each € > 0, 
Jim PUA UB UC.) = 0: 


The phrase ‘in probability’ will often be abbreviated ‘i.p.’, and if Y, > Y i.p. 
as n —> œ, we may also write 


lim Y, =Y ip. 
noo 


186 12. THEOREMS OF A.S. CONVERGENCE 


The Law of Large Numbers in Chapter 5 implies that if S is the sequence 
of partial sums of an iid sequence of R-valued random variables having finite 
second moments, then S,/n > E(Sı) i.p. The Strong Law of Large Numbers, to 
be stated and proved later in the present chapter, implies that S,,/n > E(S)) as. 
(In addition, we will find that the second moment hypothesis can be replaced 
by the weaker assumption that E(S1) exists.) The term ‘strong’ is often used to 
describe a result that asserts almost sure convergence. 

The next problem and the theorem and example following the problem give 
important information about the relationship between i.p. convergence and a.s. 
convergence. 


Problem 1. Let Y, AZ, B$, C} be as in Definition 1. Prove that lim, Yn = Y as. 
if and only if 
P(lim sup[A;, U B} U Chl) = 


for all e. 


Theorem 2. Let Yn, n = 1,2,..., and Y be R-valued random variables de- 
fined on a probability space (Q), F, P). If Yn ~ Y a.s. as n > co, then Yn > 
Y i.p. as n —> œ. 


Problem 2. Prove the preceding theorem. Hint: Use Problem 1 in conjunction with 
Problem 9 of Chapter 6. 


(Counter)example 1. Let P denote Lebesgue measure on Q = [0,1). For 
m = 0,1,2,... and 2™ < n < 2™+1 let Y, denote the indicator function of the 
interval 

nS neo 1). 
It is clear that, for each w, lim sup Y,(w) = 1 and liminf Y,(w) = 0 and, hence, 
that, for every w, the sequence (Y,(w): n = 1,2,...) fails to converge. Since 
P({w: Yn(w) 4 0}) > 0 we see that Yp ~ 0 i.p. Thus, a.s. convergence is 
strictly stronger than i.p. convergence. 


In the setting of a general measure space (Q, F, u), ‘convergence in probability’ 
is replaced by convergence in measure. That is, we require that 


lim p(A, U BE UCZ) =0 
n> co 
for all € > 0, where A$, BS ,C}, are defined as in Definition 1. It is important 


to remember that in an infinite measure space, convergence almost everywhere 
does not imply convergence in measure. 


Problem 3. Provide an example in which there is almost everywhere convergence 
but not convergence in measure. 


12.1. CONVERGENCE IN PROBABILITY 187 


The following lemma enables us to replace the hypothesis of convergence al- 
most everywhere in some theorems of Chapter 8 by the hypothesis of convergence 
in measure. 


Lemma 3. Let fn,n = 1,2,..., and f be R-valued measurable functions de- 
fined on a common measure space. If fn + f in measure as n — oo, then there 
exists a subsequence of (fn) that converges to f almost everywhere. 


Problem 4. Prove the preceding lemma. Hint: Recall that the Borel Lemma is 
true for general measure spaces, and apply the general measure space version of 
Problem 1. 


Each of the next three results is proved in the following manner: Use con- 
vergence in measure to find a subsequence that converges almost everywhere, 
apply the corresponding result from Chapter 8 to the subsequence, and then use 
Proposition 4 of Appendix B, if necessary, to get back to the original sequence. 


Lemma 4. [Fatou] Let fn, n = 1,2,..., and f be R*-valued measurable 
functions defined on a measure space (N, F, u). Suppose that fn > f in measure 


as n — œ. Then 
J fdu < timint | fy dp. 


Theorem 5. [Dominated Convergence] Let f, g, and fn, n = 1,2,..., be 
R-valued measurable functions defined on a measure space (N, F, pu). Suppose 
that |fn(w)| < g(w) for almost every w and every n, that f gdu < œ, and that 
fn > f in measure as n —> œ. Then 


fifidu<o, im ffdu= ffan, and tm fif- faldu =0. 


Theorem 6. [Uniform Integrability Criterion] Let (X1, X2,...) be a sequence 
of R-valued random variables whose limit X exists in probability. Assume that 
E(|Xn|) < œ for all n. Then, the following three statements are equivalent: 

(i) the family {Xn: n = 1,2,...} is uniformly integrable; 
(ii) E(|X|) < œ and limps% E(|Xn — X|) = 0; 
(iii) limno E(|Xn]) = E(X) < œ. 
Each of these three conditions implies 
(tv) limp+to0 E(Xn) = E(X). 


Problem 5. Prove the preceding three theorems. Hint: For a proof by contradiction 
that (iii) => (i) in the Uniform Integrability Criterion, find a subsequence (Xn, ) 
such that no further subsequence is uniformly integrable. 


188 12. THEOREMS OF A.S. CONVERGENCE 


12.2. Laws of Large Numbers 


Let S = (Sn: n = 1,2,...) be the sequence of partial sums of an iid sequence 
(Xn: n = 1,2,...) of R-valued random variables. We will study the behavior 
of S,/n as n — oo by a sequence of lemmas, leading up to the main result, 
the Strong Law of Large Numbers, which is Theorem 14. These lemmas are 
interesting in their own right, because their proofs illustrate commonly used 
techniques. In some cases, the lemmas are given in greater generality than that 
needed for their application in the proof of Theorem 14. For example, our first 
four lemmas make no mention of independence. In fact, it will be seen that even 
the final result only requires pairwise-independence. 


Lemma 7. Suppose that a sequence (Yn: n = 1,2,...) of R-valued random 
variables satisfies E(Y,) > c for some real constant c and 0, Var(Yn) < oo. 
Then Y, > c a.s. as n > œ. 


PROOF. Let Zn = Yn — E(Yn) for each n, and 
CO 
W=) 2, 
n=l 
an R -valued random variable. By the Monotone Convergence Theorem, 
OO CO 
E(W) = X` E(Z2) = X Var(¥n) < œ. 
n=1 n=1 


Therefore, W < œ a.s. and, hence, with probability one Zn > 0 as n > œ. O 


Problem 6. Give an alternative proof of the preceding lemma using the Chebyshev 
Inequality and the Borel Lemma. 


Problem 7. Show that if the hypothesis }_,„ Var(Yn) < oo in Lemma 7 is replaced 
by the weaker hypothesis lim, Var(Y,) = 0, then we may conclude that lim, Yn = 
Y i.p. 


Problem 8. Let (Sn: n = 1,2,...) be the sequence of partial sums of an iid se- 
quence of R-valued random variables having finite second moment. Show that if 
(nk: k =1,2,...) is an increasing sequence of positive integers such that for some 
p> l1, nk > k?,k = 1,2,..., then 


The preceding problem shows that Lemma 7 can be used to get the almost sure 
convergence of S/n along a subsequence, provided the summands are assumed 
to have finite second moments. But we will not be making such an assumption 
in the Strong Law of Large Numbers. In order to be able to apply results 
like Lemma 7 when second moments are not finite, we will need to work with 


12.2. LAWS OF LARGE NUMBERS 189 


summands that have been truncated. The following two lemmas introduce the 
appropriate truncation and show that when the summands have finite means, 
this truncation makes no essential difference in the limiting behavior of S,,/n. 


Lemma 8. Let (Xn: n = 1,2,...) be a sequence of identically distributed 
random variables having finite mean. Then 


P(lim sup{w: |Xp(w)| > n}) =0. 
n> 00 


PROOF. Let F denote the common distribution function of the random vari- 
ables |X,,|. Using the expression for the mean of a nonnegative random variable 
given in Corollary 20 of Chapter 4, we obtain 


o> | [1-F@)]dz > DOH- F) = D Pilo: Xalo) >n): 


An appeal to the Borel Lemma completes the proof. O 


Problem 9. Generalize the conclusion of the preceding lemma by replacing “ > n” 
by “ > cn”, where c is an arbitrary positive constant. Prove this generalization 
and then use it to conclude that 


. X 
lim = = 0 a.s. 
nooo n 


for any sequence (X,,) of identically distributed R-valued random variables having 
finite mean. 


Lemma 9. Let(Xn: n= 1,2,...) be a sequence of identically distributed ran- 
dom variables having finite mean. For n = 1,2,... let Yn = Xnljw: |x, (w)|<n}- 
Let (Sn: n = 1,2,...) and (Tn: n = 1,2,...) denote the sequences of partial 
sums of the sequences (Xn) and (Yn), respectively. Then 


lim (2 — =) = 0 a.s. 


and 


noo n 


PROOF. The first conclusion is left as an exercise. To prove the second con- 
clusion, note that by an application of the Dominated Convergence Theorem, 
requested below in Problem 10, E(Y,) > E(X 1). It is a standard exercise to 
show that if a sequence (an) converges, then the sequence (a, (a1 + a2)/2, (a1 + 
az + a3)/3,...) converges to the same limit. Taking a, = E(Yn), we see that 
E(T,)/n > E(Xı) as n> oo. O 


190 12. THEOREMS OF A.S. CONVERGENCE 


Problem 10. Complete the proof of the preceding lemma, by doing the following: 
(i) use Lemma 8 to show that (S,/n—T,/n) > 0 a.s. as n > o0; (ii) show how 
the Dominated Convergence Theorem applies to give E(Y,) > E(X1) as n > co; 
(iii) prove the fact stated in the proof about convergent sequences (an). 


Lemma 9 says that we can analyze the convergence of S,,/n by studying that 
of T,,/n. It is a standard technique to use the Borel Lemma to replace random 
variables of interest by truncated random variables, as in Lemma 8 and Lemma 9. 

The next lemma gives us control over the second moments of the truncated 
random variables. 


Lemma 10. Let Xn, and Yn be as in Lemma 9. Then 


= Var(Y;) 
a 


j=l 
PROOF. The function (j,2) ~~ «7I_;,;)(x)/j* is a nonnegative measurable 
function on the product space {1,2,...} x (—0oo,0o). By the Fubini Theorem, 


its integral with respect to the product of counting measure and the common 
distribution Q of the X, equals two iterated integrals. One of these is 


[ (> p) 


j2|2| 


1 fe) 
< J r? (a+ zd) Q(dz) 
(00,00) NT? Je J 


=1+ E(|Xi}) < œ. 


The other is 
5 E(Y?) 
j? 


j=1 


b] 


which must, therefore, also be finite. Since variances are no larger than second 
moments, the proof is complete. O 


Notice that the conclusion that X` Var(Y;)/j? < œ in the preceding lemma 
is easy to prove if E(X?) < oo. In the current development we only want 
to assume the means to be finite. The history of probability contains many 
instances of theorems with superfluous assumptions on moments that have later 
been removed by better and often simpler proofs. 

In order to proceed further, we need to assume some independence. 


Lemma 11. Let (Xn: n = 1,2,...) be a sequence of identically distributed 
pairwise independent R-valued random variables with finite mean, and let Tn be 
defined as in Lemma 9. Fiz c > 1 and, for m = 0,1,2,..., let bm = [c™|. Then 
limm—+oo Tbn /bm = E(X1) a.s. 


12.2. LAWS OF LARGE NUMBERS 191 


Proor. By Lemma 9, E(Ts,,)/bm > E(X1) as m — oo. In order to apply 
Lemma 7 we calculate 


oe) © bm 
D Var (F=) < 5 Se” Var(Y;) 
m=0 ve 


m=0 j=1 


P wat 5 S22” Var(Y;) 


j= 2 m> 2i- 1) 


== (van w par Var( — | 


which is finite by Lemma 10. An appeal to Lemma 7 completes the proof. O 


IA 


Problem 11. In the setting of the preceding lemma, show that T,/n > E(X1) i.p. 
as n => oo. Hint: Use Lemma 10 to show that Var(T,/n) > 0 as n > oo, then 
apply Problem 7. 


The preceding problem, in conjunction with Lemma 3, shows that (7;,,/n) 
(and hence (S,/n)) converges almost surely to the desired limit along some 
subsequence. But we need more than that, and Lemma 11 gives it to us: It shows 
that this convergence occurs along certain specific subsequences (bm). As the 
next lemma shows, when the random variables Xn are nonnegative, convergence 
along the subsequences (bm) easily implies convergence along the full sequence. 


Lemma 12. Let Tan and Xn be as in the preceding lemma and suppose that 
each Xn is nonnegative. Then T,,/n > E(X,) a.s. as n > œ. 


PROOF. Let c > 1 and define bm as in Lemma 11. For n = 1,2,3,... let 
M (n) equal the smallest integer m for which n < c™. Then, by Lemma 11, for 
almost every w, 


T. T, W 
lim sup Tal) < lim sup Tomm (W) ) 
n= oo n n>% C bun) 


= cE(Xı) . 


For an inequality in the other direction let L(n) equal the largest integer m for 
which c™ < n. Then, for almost every w, 


T, W 
gee ee? aia ae Torm le) 
n—0oo n n>% c br (n) 


= c¢ 'E(X). 


Let c decrease to 1 through a countable sequence to complete the proof. O 


Problem 12. What purpose, if any, is served by the phrase “countable sequence” 
in the last sentence of the preceding proof? 


192 12. THEOREMS OF A.S. CONVERGENCE 


Problem 13. If Var(X1) < oo, the preceding development can be simplified. In 
particular, truncation is not necessary. Carry out such a simplification, and, as 
you do so, weaken the assumption of identical distributions to one of identical 
means and variances. Also weaken the independence assumption to one concerning 
correlations. 


In our final lemma before the main result of this section, we drop the assump- 
tion that the means be finite. 


Lemma 13. Let (Xn: n = 1,2,...) be a sequence of nonnegative identi- 
cally distributed pairwise-independent R-valued random variables, and denote by 
(Sn: n=1,2,...) the corresponding sequence of partial sums. Then, as n > ow, 
Sn/n > E(X1) a.s. Moreover, if E(X1) < œ, then E(|S,/n — E(X1)|) — 0 as 
n — œ. 


ProoF. First assume that E(X,) < oo. Then, the conclusion S,/n > 
E(X) a.s. follows immediately from Lemma 12 and Lemma 9. Since 
=) = oC 


n n 


E( )=E(X), 


condition (iii) in the Uniform Integrability Criterion is satisfied, and, thus, 
E(|Sn/n — E(X,)|) > 0. 

In case E(X,) = co we use a truncation argument and the Monotone Con- 
vergence Theorem: 


lim inf Sn(w) > lim lim inp EA Xi (w)) + + (RA Xn(w)) 
n—>+ o0 n k=œ n= n 


= lim E(kAXı) = E(Xı) = œ a.s. O 
=> 00 


It remains to drop the assumption that the random variables X„ be nonneg- 
ative. We will assume that the means of the random variables X,, exist (not 
necessarily finite), and apply the preceding lemma separately to the positive and 
negative parts. 


Theorem 14. [Strong Law of Large Numbers] Let (Xn: n = 1,2,...) be 
a sequence of identically distributed pairwise-independent R-valued random vari- 
ables for which E(X,) exists, and let (Sn: n = 1,2,...) denote the corresponding 
sequence of partial sums. Then, as n > x, 


Sn —+ E(X,) a.s. 
n 
Moreover, if |E(X1)| < œ, then 
Sn 
2H — E(X,)|) 30 


as n —> o0. 


12.3. APPLICATIONS 193 


PROOF. Let (Un: n = 1,2,...) and (Vn: n = 1,2,...) denote, respectively, 
the sequences of partial sums of the sequences (X7) and (X77). By Problem 2 
of Chapter 9, each of these sequences is pairwise independent. By the preceding 
lemma, 


Sn Un Vn _, PXH B(Xr) = E(X) as. 
n n n 


If, in addition, |E(X1)| < œœ, then, from the triangle inequality, the positivity of 
the expectation operator, and the preceding lemma, we obtain 


E(| È — EX) 
< E(\ - EXHI) +E- EXT) 40. o 


Since almost sure convergence implies convergence in probability, we have also 
obtained lim, Sn/n = E(X}) i.p. under the assumption that E(X) exists. 


Problem 14. Compare the conclusion described in the preceding sentence with the 
Law of Large Numbers in Chapter 5. 


Problem 15. Let (So = 0, S1, S2,...) be the sequence of partial sums of an iid 
sequence having positive (possibly infinite) mean. Prove that Sn — œœ a.s. as 
n — o. 


* Problem 16. Let (Xn: n = 1,2,...) be a sequence of identically distributed pair- 
wise independent R-valued random variables, and let (Sn: n = 1,2,...) denote the 
corresponding sequence of partial sums. Assume that E(|Xi|) = co. Prove that 
for all c > 0, 


(12.1) P({w: |Xn(w)| > en for infinitely many even n}) = 1. 


Conclude that 


(12.2) P({w: Sale) converges in R as n > co}) =0. 


We will discover in Chapter 15 that it is possible for the hypothesis E(|X1|) = 
oo in the preceding exercise to hold and, nevertheless, for there to exist a finite 
number p such that lim, Sn/n = u i.p. 


12.3. Applications 


The following exercise shows that the expectation of a random variable need not 
be representative of typical values of that random variable. 


* Problem 17. [Stick-breaking random walk] Let (Xn: n = 1,2,...) be an iid se- 


quence of random variables uniformly distributed on (0,1]. For n = 0,1,2,..., set 
Sn = []" _, Xx. Calculate E(S,). Compare Sn and E(S,) for large n. Give an 


intuitive explanation of the result of your comparison. Describe the sequence (Sn) 
in terms of the successive lengths of the remaining part of a stick whose original 


194 12. THEOREMS OF A.S. CONVERGENCE 


length was one. (The sequence (Sn) is a random walk on the state space (0, 1], 
which is a semigroup under the operation of ordinary multiplication. See Chap- 
ter 11.) Hint: In order to get an idea of how Sn behaves for large n, apply the 
Strong Law of Large Numbers to log(Sn). 


For the next example, we extend the notion of mutual singularity introduced 
in Chapter 6. A family (Ha: a E€ A) of o-finite measures on a measurable space 
(Q,F) is mutually singular if, for each a € A, there exists a measurable set 
Ba C Q such that {Ba: a € A} is a family of pairwise disjoint sets and, for each 
a € A, Ha( BS) = 0. 


Example 2. Fix a real number p € [0,1], and let X = (X1, X2,...) be an iid 
sequence of Bernoulli random variables, each with parameter p. As in Example 1 
of Chapter 2, for each w, identify X(w) with a point Y(w) in [0,1] by way of 
binary expansions: 


Y (w) = Xi (w) Xo (w)X3(w) ..etwo - 


Let Qp be the distribution of Y. 

Thus we have defined a family (Qp: p € [0, 1]) of probability measures on the 
Borel sets of [0,1]. We will use the Strong Law of Large Numbers to show that 
this family is mutually singular. For each p € [0, 1], let 


1 n 
Bp = {y= .LZ1ToT3 . . two | ae > pas n> cob. 

By the Strong Law of Large Numbers, Q,(B5) = 0 for all p € [0,1]. Since the 
sets By are pairwise disjoint, the family (Qp: p € [0, 1]) is mutually singular, as 
desired. 

The mutual singularity just demonstrated implies that no Q, has a density 
respect to Qg if p # q. In particular, since Qj/2 is Lebesgue measure on [0,1], 
no Qp, p # 1/2, has a density with respect to Lebesgue measure. 


Problem 18. For 0 < p < 1 and Q, as defined in the preceding example, let Fp be 
the corresponding distribution function. Show that Fp is continuous and strictly 
increasing on [0,1], Fp(0) = 0, Fp(1) = 1, and 


F, (z) (1 — p)Fp (2x) if 0 <x < 1/2 
gr) = 
ý l—p+pF,(2¢—-1) if1/2<2<1. 


* Problem 19. Use the preceding exercise to calculate F,(3/4), Fp(3/8), Fp(1/3), 
and F,(7/10). Check each of the first three answers by interpreting it as the 
probability of the union of certain disjoint events each of whose probability is 
easily calculated. 


12.3. APPLICATIONS 195 


Problem 20. Let Y be as defined in Example 2. Write Y as an infinite sum of 
independent random variables, and use this representation to calculate the mean 
and variance of Y under the various probability measures P,. Also calculate the 
mean and variance of Y directly using the distribution function Fp. 


Consider a product space (Q, F, P) = (¥,G,Q)~ and write a member w of 
Q as w = (y1, We,...). Fix B € G and define X,(w) = Ip(yn) for n = 1,2,.... 
We can think of the sequence (Yn: n = 1,2,...) as representing the outcomes 
of repeated trials in an experiment, with the random variable S, = ye Xk 
counting the number of times in the first n trials that an outcome is in the set 
B, or stated more informally, “the number of times that B occurs by time n”. 
Applying the Strong Law of Large Numbers, we conclude that the proportion of 
times that B occurs by time n converges a.s. to Q(B) as n > œ. 

The following result is both an application and an extension of the ideas 
introduced in the preceding paragraph. The function Fn, defined in the theorem 
in terms of an iid sequence, is often called the empirical distribution function of 
the sequence. 


Theorem 15. [Glivenko-Cantelli] Let (Yn: n = 1,2,...) be an iid sequence of 
R-valued random variables defined on a probability space (Q, F, P) and having a 
common distribution function F. For each x € R, let J, be the indicator function 
of the interval (—oo, x], and define random variables 


F(a) = =) e(Xp) 


k=1 
for w € Q and n = 1,2,.... Then 


lim sup|F,(x2) — F(zx)| = 0 as. 
R 


n—> 0O rE 


Problem 21. Prove the preceding theorem. Hint: The discussion in the paragraph 
preceding the statement of the theorem shows that for each fixed z, F,(r) —> 
F(z) a.s. as n — co. Take the intersection of the relevant events for x rational, 
and then use the monotonicity of F and of each F,(-,w) as functions of x to get 
the desired result. 


Problem 22. Describe the changes, if any, needed in Theorem 15 to accommodate 
R-valued random variables. 


In Figure 11.4 of Chapter 11 the distribution function of the time of first 
return to 0 of a simple symmetric random walk is shown. Also shown, in dashes, 
is the empirical distribution function obtained by having 32 people each flip a 
coin until the number of heads thus far obtained was equal to the number of tails 
thus far obtained or until the coin had been flipped 80 times whichever came 
first. To aid in visualization, the jumps in both graphs have been filled in. 


196 12. THEOREMS OF A.S. CONVERGENCE 


12.4. 0-1 laws 


In the Strong Law of Large Numbers we are interested in the event 
As tw: Sale) > E(Xı) as n > oo}, 


the occurrence of which is determined by an independent sequence X = (Xn: n > 
1) of random variables. Note that the occurrence of E does not depend on the 
values along any given finite subsequence of X. We are about to see that such 
events always have probability one or zero. We first need a definition. 


Definition 16. Let (Gm: m = 1,2,...) be a sequence of sub-o-fields of a 
o-field F. For each positive integer n let 


Hn =0(Gm:im>n). 


The o-field 


is called the tail o-field of the sequence (Gm), and the members of 7 are tail 
events. 


The following theorem reflects the fact that the tail o-field of an infinite 
independent sequence of o-fields is independent of itself. 


Theorem 17. [Kolmogorov 0-1 Law] Each tail event of an infinite indepen- 
dent sequence of o-fields has probability 0 or 1. 


PROOF. Let (Gm: m = 1,2,...) denote an infinite independent sequence of 
o-fields, and let Hn and 7 be defined as in Definition 16. By Proposition 4 
of Chapter 9, (Hm+1, 91, ---, Om) is an independent finite sequence for each m. 
Since T C Hm+i, we see that (7,G1,...,Gm) is independent for each m. Hence 
the infinite sequence (7,G1,G2,...) is independent. Applying Proposition 4 of 
Chapter 9 again, we conclude that (7, Hı) is an independent pair. Since T C Hı, 
the pair (7,7) is independent. The result now follows from Problem 11 of 
Chapter 9. O 


Corollary 18. Any random variable measurable with respect to the tail o- 
field of an infinite independent sequence of a-fields is equal to some constant a.s. 


Problem 23. Let (Sn: n =0,1,2,...) be the sequence of partial sums of an inde- 
pendent (not necessarily identically distributed) sequence of R-valued random vari- 
ables. Prove that the R-valued random variables lim sup(S;,/n) and lim inf(S,,/n) 
each equal constants a.s. 


12.4. 0-1 LAWS 197 


Problem 24. Continue with the notation of the preceding exercise and, in addition, 
assume that the summands are identically distributed and do not have a mean 
(finite or infinite). Use Problem 16 to prove that limsup(S,/n) and lim inf(S,/n) 
cannot both be finite. In addition, prove that limsup(S,/n) = —liminf(S,/n) = 
oo if the distribution of the steps is symmetric with respect to some real number. 


Problem 25. Find an independent sequence (X1, X2,...) of random variables such 
that 
Pianale eero 5 l 
IL OO 
Problem 26. Let (So = 0,51, .$2,...) be a random walk in R and let (an > 0: n= 
0,1,2,...) be an increasing sequence with limit oo. Use the Kolmogorov 0-1 Law 
to show that 


: Sn 
lim sup — 
n=% în 


is an almost surely constant R-valued random variable. Let c denote its value. 
Describe the difficulty in trying to use the Kolmogorov 0-1 Law to prove that the 
event 

lim sup[Sn > can] 


n — o0 
has probability 0 or 1. Then use the forthcoming Theorem 19 to prove that it does 
have probability 0 or 1. 


We say that a o-field of events is 0-1 trivial if every event in it has probability 
0 or 1. Of course, the trivial ø-field {0, Q} is also 0-1 trivial. The Kolmogorov 
0-1 Law says that the tail o-field of an infinite independent sequence of o-fields 
is 0-1 trivial. The focus of the remainder of this section is on another law that 
describes a 0-1 trivial o-field. 

Let Y denote any set. A permutation 7 on Zt \ {0} induces a permutation ĉ 
of Q F via the formula 


î (Yi, V2, S -) = (Vr-1(1), Pr-1(2)5 s$ sja 


A subset A of @72, W is exchangeable if 7(A) = A for all permutations 7 of 
Z* \ {0} such that 7(k) = k for all but finitely many k. Let G denote a o-field 
of subsets of Y. It is easy to see that the collection of all exchangeable members 
of Qz G is a o-field; it is called the exchangeable o-field in Q&Z, (WV, G). 


Theorem 19. [Hewitt-Savage 0-1 Law] Let 
(0,F,P) = [[(%,9,Q), 
n=1 


where (V,G,Q) is a probability space. Then the exchangeable o-field is 0-1 trivial. 


* Problem 27. Prove the preceding theorem. Hint: Use Lemma 18 of Chapter 9. 


198 12. THEOREMS OF A.S. CONVERGENCE 


Problem 28. Let Xn: (Q,F7,P) — (¥,G) be independent identically distributed 
random variables and set X = (X1, X2,...). On what basis do we know that 


{x-A): A exchangeable in Q) g} 
n=1 


is a Sub-o-field of F? Prove that it, known as the exchangeable o-field in F induced 
by X, is 0-1 trivial (with respect to the probability measure P). 


Problem 29. Let Xn: (Q, F, P) > (Yn, Gn) be independent random variables and 
set X = (Xi, X2,...). On what basis do we know that 


{X7\(B): B tail in & gn} 
n=l 
is a sub-o-field of F. Prove that it, known as the tail o-field in F induced by X, 
is 0-1 trivial (with respect to the probability measure P). 


In view of Problem 28, we can, when discussing random walks, speak of ez- 
changeable events and the exchangeable o-field whether or not the underlying 
probability space is a product space. Similarly, when treating sequences of in- 
dependent random variables, tail events may play a role even if the underlying 
probability space is not a product space. 


* Problem 30. Let (So = 0, S1, S2,...) be a random walk the support of whose step 
distribution is Z. With respect to the sequence of steps, decide, in each case, 
whether the given event is a tail event and whether it is an exchangeable event: 

(i) limsup, _,,,[Sn > 0]; 
Gi) lim infp+oo[Sn = San]; 


(iii) limsup,,_,,,[Sn = Sa]; 


12.5. Random infinite series 


This section and the two that follow it are concerned with the a.s. convergence 
of infinite series of independent R-valued random variables. When we speak 
of convergence we will mean convergence within R (as opposed to convergence 
within R). The problem is rather uninteresting if the random variables are 
identically distributed, since in that case the sequence of partial sums will, with 
probability 1, fail to approach a finite limit, unless the summands equal 0 a.s. 


Problem 31. Prove the assertion made in the preceding sentence. 


Problem 32. Let (X1, X2,...) be an independent sequence of random variables 
with 
P({w: Xn(w) = 27"}) = P({w: Xn(w) = 0}) = 1/2. 


Prove that $` Xn converges a.s. and calculate the distribution function of the limit. 


12.5. RANDOM INFINITE SERIES 199 


Problem 33. Let (X1, X2,...) be an independent sequence of random variables 
with 
P({w: Xn(w) = 1}) =1-— P({w: Xn(w) = 0}) = 1/n. 


Prove that $` Xn diverges a.s. 


Problem 34. Let (X1, X2,...) be an independent sequence of random variables 
with 
P({w: Xa(w) = 1}) = 1— Pw: Xrlw) =0}) =2”. 


Prove that X` Xn converges a.s. and that this infinite series equals 0 with proba- 
bility 


Tle —2 ")>0, 
n=1 


and equals 1 with probability 
oo gon oo o 
(E=) ie- 


* Problem 35. Let (X1, X2,...) be an independent sequence of R-valued random 
variables, and suppose that 


P({w: |Xa(w)| < 1/n°}) > 1- (1/n*) 
for all n. Prove that $` Xn converges a.s. 


Problem 36. Find an independent sequence (X1, X2,...) of nonnegative random 
variables such that for each n, 


E(Xn) < œ, SX <ooas., and E(X Xn) Epo: 
n=l n=1 


The following result says that it is not by chance that in each of the preceding 
exercises, there is either almost sure convergence or almost sure divergence of 
the infinite series. 


Proposition 20. A series X` Xn of independent R-valued random variables 
either converges (in R) a.s. or diverges a.s. 


Problem 37. Prove the preceding proposition by using the Kolmogorov 0-1 Law. 


Problem 38. Show that the assumption of independence may not be deleted from 
the preceding proposition. 


200 12. THEOREMS OF A.S. CONVERGENCE 


12.6. The Etemadi Lemma 


In view of the preceding proposition it is natural to search for conditions that 
distinguish between the two cases—a.s. convergence and a.s. divergence. In the 
section following this section, we will obtain necessary and sufficient conditions 
on the distributions of independent R-valued random variables X,, for the series 
X Xn to converge a.s. In the present section, we lay some groundwork by proving 
that a.s. convergence is equivalent to i.p. convergence for such a series. This 
equivalence is a consequence of the following important lemma which is useful for 
many purposes besides that of obtaining conditions for almost sure convergence. 


Lemma 21. [Etemadi] Let (X1, X2,...) be an independent sequence of R- 
valued random variables, and let Sm = >>; Xi form > 0. Then, for each 
E€ >0, 


(12.3) P({w: sup |Sm(w)| > 4e}) < 4sup P({w: |Sm(w)| > e}), 

and for each positive integer n, 

(12.4) P({w: pmax, |Sm(w)| > 4e}) < a E |ISmlw)| > e}). 
PROOF. It is enough to prove (12.4), since (12.3) follows from (12.4) and the 


Continuity of Measure Theorem. 
Let So = 0, and for m = 1,2,...,n, let 


Am = {w: max |S; (w)| < 4e < |Sm(w)I}. 


The left side of (12.4) equals 


n 


X P(Am) 


SS P(AmN {w: |Sn(w)| > 2e}) + 3 P(Am N {w: |Sn(w)| < 2€}) 
m=1 m=1 


n—-1 

< P({w: |Sp(w)| > 2e}) + XC P(AmN {w: |Sn(w) — Sm(w)| > 2€}) 
n—l1 

= P({w: |Sp(w)| > 2e}) + X P(Am) P({w: |Sn(w) — Sm(w)| > 2€}) 


< P({w: |Sn(w)] > 2e}) +, max P({w: |Sn(w) — Sm(w)| > 2e}) 
<2 max P({w: |Sa(w) — Sm(w)| > 2€}) 

<2 max [P({w: |Sm(w)| > E} + P({w: Salo) > EN] 

<4 max P({w: [Sm(w)|><}). o 


12.6. THE ETEMADI LEMMA 201 


Theorem 22. A series $| Xn of independent R-valued random variables con- 
verges a.s. to an R-valued random variable Z if and only if X` Xn converges i.p. 
to Z. 


PROOF. The ‘only if’ assertion follows from Theorem 2. To prove the ‘if’ 
assertion, fix £ > 0, let Sn = Xm; Xm for n > 0, and suppose that Sn > Z in 
probability as n — oo. By the Continuity of Measure Theorem, 


(12.5) P({w: lim sup |Sn(w) — Z(w)| > 8e}) 

-= lim Piw: sup [Sn(we) ~ Z(w)| > 8e}) 
(12.6) < lim sup P({w: |Sm(w) — Z(w)| > 4e}) 
(12.7) n lim sup P({w: sup |Sn(w) = Sm(w)| > 4¢}), 


Since S, converges to Z in probability, (12.6) equals 0. We obtain an upper 
bound for (12.7) by applying (12.3) of the Etemadi Lemma to the sequence 
(Amri AmE; s iF the bound is 


4lim sup sup P({w: |Sn(w) — Sm(w)| > €}) 


moo n>m 


< 4lim sup P({w: |Sm(w) — Z(w)| > €/2}) 
+ 4limsup sup P({w: |S,(w) — Z(w)| > €/2}), 


M> n>m 
which equals 0 as a consequence of the convergence of Sn to Z in probability. 
Thus (12.5) equals 0 for every £. Now apply Problem 1. O 


In order to apply the preceding result to relate the two types of convergence in 
the present setting, we need to develop a criterion for convergence in probability 
that does not require us to have a candidate for the limiting random variable. 


Definition 23. A sequence (Wn: n = 1,2,...) of R-valued random variables 
defined on a probability space (N, F, P) is Cauchy in probability if for every £ > 0 
there exists an integer l such that 


P({w: |Wnlw) — Wr(w)| > e}) <e 
whenever n,m > l. 


Lemma 24. A sequence (Wn: n = 1,2,...) of R-valued random variables 
defined on a common probability space converges in probability to an R-valued 
random variable if and only if it is Cauchy in probability. 


PARTIAL PROOF. Suppose that the sequence (W,,) is Cauchy in probability. 
By definition we may choose a subsequence (Wn, : ni < ng <...) such that 


P({w: Warp: (w) — Way (w)] > 278}) < 27. 


202 12. THEOREMS OF A.S. CONVERGENCE 


The sum of these probabilities is finite, so by the Borel Lemma, 
[Wrest (w) — Wh, (w)| < 2 


for almost every w and all sufficiently large k depending on w. For such w, 
repeated applications of the triangle inequality imply that 


Wiig (w) “<= Wn, (w)| < oes 


for all positive m and sufficiently large k. Thus, the sequence (Wn, (w): k = 
1,2,...) is a Cauchy sequence of real numbers and has a finite limit W(w). W 
is defined a.s. Set W(w) equal 0 for any w for which W(w) has not been defined 
by the preceding discussion. 

We have a subsequence of (Wp: n = 1,2,...) that converges almost surely 
to W and thus in probability to W. To see that the full sequence converges in 
probability to W we note that, for n > nk, 


P({w: |Wn(w) — W(w)| > €}) 
< P({w: |Wr(w) — Wn, (w)| > 5) + Pw: Wn, (w) -Ww > 5) 
ao eE 
for sufficiently large k. 


The proof that convergence in probability implies Cauchy in probability is left 
for the next problem. O 


Problem 39. Complete the proof of the preceding lemma by showing that a se- 
quence that converges in probability to a R-valued limit is Cauchy in probability. 


12.7. + The Kolmogorov Three-Series Theorem 


We now search for conditions on the distributions of the individual summands Xn 
that will be necessary and sufficient for the almost sure convergence of $, Xn, 
or, equivalently, its convergence in probability. Fix b > 0, and, for n > 1, define 


Y,,(w) Xalw) if |Xn(w)| <b 
niw) = 
0 otherwise. 


If 

S Pu: Xn(w) # ¥n(w)}) = $ Pe: |[Xn(w)| > b} = 00, 
then it follows from the Borel-Cantelli Lemma that with probability 1, infinitely 
many of the X,,’s will be larger than b in absolute value, and the series „Xn 
will diverge a.s. On the other hand, if 


>> P({w: Xalo) # Yn(w)}) < 00, 


then it follows from the Borel Lemma that with probability 1, only finitely many 
of the summands in the series > Xn differ from the corresponding summands 


12.7. THE KOLMOGOROV THREE-SERIES THEOREM 203 


in the series 5" Yn. Therefore, in this case, }),, Xn converges a.s. if and only if 
>, Yn converges a.s. In this way, we reduce the study of arbitrary series of inde- 
pendent R-valued random variables to the study of series with finite variances. 


Proposition 25. Let (Z1, Z2,...) be an independent sequence of R-valued 
random variables with finite variance. If >), Var(Zn) < œ, then „(Zn — 
E(Zn)) converges a.s. in R. 


PROOF. We may assume without loss of generality that E(Zn) = 0 for all n. 
For each n, let Sn = Jm; Zm. Let € > 0. Choose an integer l such that, for 
n>m>l, 


Var(Sn—Sm)= 5 > Var(Zp) <e. 
k=m-+1 


By the Chebyshev Inequality, 


Var(S, — S 
P({w: |Snlw) — Sm(w)| > e}) < War(Sn — Sm) LE: 
Thus, the sequence (Sn: n = 1,2,...) is Cauchy in probability. By Lemma 24 
it converges in probability. By Theorem 22 it converges almost surely. O 


Corollary 26. Let (Z1, Z2,...) be an independent sequence of R-valued ran- 
dom variables with finite variance. If X`, Var(Zn) < œ, then $_„ Zn converges 
or diverges a.s. according as >|, E(Zn) converges or diverges. 


Corollary 26 and the discussion preceding Proposition 25 constitute a proof 
of the first ‘if? assertion of the following result. 


Theorem 27. [Kolmogorov Three-Series] Let (X1, X2,...) be an indepen- 
dent sequence of R-valued random variables, and let b be a positive real number. 
Define 


yo [XO YX st 
x 0 otherwise. 


Then >), Xn converges a.s. if and only if the following three series converge: 


(12.8) >> Pw: Xn) # ¥a)}), 
(12.9) > EY), 
(12.10) X- Var(¥n) - 


If one of these three series diverges, then X „ Xn diverges a.s. 


204 12. THEOREMS OF A.S. CONVERGENCE 


PROOF. As stated above, the first ‘if’ assertion has already been proved. 
Once the ‘only if’ assertion is proved, the second ‘if’ assertion is an immediate 
consequence of Proposition 20. Thus we will focus our attention on the proof of 
the ‘only if’ assertion. 


Suppose that > Xn converges a.s. Then X, — 0 a.s.; so, for almost every 
w, Xn(w) = Y,(w) for all but finitely many n. Therefore, )>, Yn converges a.s., 
and furthermore, as previously observed, (12.8) must converge as a consequence 
of the Borel-Cantelli Lemma. It follows from Corollary 26 that, to finish the 
proof, it is enough to show that (12.10) converges. 


After possibly enlarging the original probability space, we may assume that 
there is a sequence (Y7, Y;,...) that has the same distribution as and is inde- 
pendent of the sequence (Y1, Y2,...). Since the two sequences have the same 
distribution, }°,, Y„ converges a.s., and hence so does the series }> (Y, — Y, ). 
Let Zn = Yn — Y,. For each n, Var(Z,) = 2 Var(Y,), since Y; has the same 
distribution and is independent of Y„. Thus, }/,, Var(Y,,) converges if and only 
if }>,, Var(Z,,) converges. Note that the random variables Z, are independent, 
bounded by 2b, and have mean 0. Therefore, we have constructed an indepen- 
dent sequence (Z1, Z2,...) of uniformly bounded random variables with mean 0, 
such that }°,, Zn converges a.s., and }°,, Var(Y;,) is finite whenever }_,„ Var(Zn) 
is finite. Thus, in order to complete the proof, it is enough to show that the a.s. 
convergence of the series }°,, Zn implies that the series )>,, Var(Z,) has a finite 
sum. 


Leto, = DF Zi. Because each of the random variables Z, has mean 0, 
E(S2) = Var(S,) = 7), Var(Z,). By removing all Z;’s that equal 0 a.s., we 
may further assume that Var(S,) > 0. 


We wish to prove that the increasing sequence (Var(S,), Var(S2),...) has a 
finite limit. (For a nice alternative approach to this part of the proof, using an 
important tool that generalizes probability generating functions, see Problem 21 
of Chapter 13.) In order to do so, we apply the inequality in Corollary 5 of 
Chapter 5, with \ = 1/2 and X = S?, and obtain 


(Var(S,))? 


We will prove that the right side of (12.11) does not converge to 0 as n > ov. 
Since the convergence in R of S, implies the convergence in R of S2, we can 
then immediately conclude that the sequence (Var(S,)) has a finite limit. In the 
following calculations, we will use the two facts that E(Z;Z3) = E(Z;)E(Z3) =0 
and E(Z? Z?) = E(Z?)E(Z3) for i 4 j, both of which follow from the indepen- 


12.7. THE KOLMOGOROV THREE-SERIES THEOREM 205 
dence of the Z;’s: 


E(S$) = Y` ECZ!) +6 5O E(Z? Z?) 


1<i<j<n 


-FDEZ +6 Y E(Z?E(Z?) 


i=1 1<i<j<n 
< 5040? B(Z?) +39 X B(Z})E(Z}) 
i=1 i=) j=1 


n n 2 
= 4b X E(Z?) +3 È B) 
= 4b? Var( Sn) + 3(Var(S,))? 
< [(4b?/ Var(S1)) + 3](Var(S,))? . 


It follows that E(S4) is bounded above by a constant multiple of (Var(S,,))’, 
uniformly in n. Thus, the right side of (12.11) stays bounded away from 0 as 
n — oo, as desired. O 


The solution of the next problem indicates that when Corollary 26 applies, it 
is often easier to use than is the Kolmogorov Three-Series Theorem. 


* Problem 40. Let (X1,X2,...) be an independent sequence of nonnegative geo- 
metrically distributed random variables and suppose that E(X,) = 1/n?. Using 
three different methods, prove that ae Xn converges a.s.: first by using the Three- 
Series Theorem, second by using Corollary 26, and third by using the Monotone 
Convergence Theorem. 


* Problem 41. [Random signs] Let (X1, X2,...) be an iid sequence of random vari- 
ables with mean 0 and support equal to the 2-point set {—1,1}. Let (ci,c2,...) 
be a sequence of positive constants, and define Y, = cn Xn. Find necessary and 
sufficient conditions on the sequence (cn) for the a.s. convergence of 7 Yn. 


Problem 42. Find an independent sequence (X1, X2,...) of R-valued random vari- 
ables and a positive constant b for which (12.8) diverges and both (12.9) and (12.10) 
converge. 


Problem 43. Find an independent sequence (X1, X2,...) of R-valued random vari- 
ables and a positive constant b for which (12.9) diverges and both (12.8) and (12.10) 
converge. 


Problem 44, Find an independent sequence of random variables, each with finite 
variance and with distribution symmetric about 0, for which (12.8), (12.9), and 
(12.10) all converge, but $`, Var(Xn) = co. (This shows that Proposition 25 has 
no converse.) 


206 12. THEOREMS OF A.S. CONVERGENCE 


12.8. + The image of a random walk 


We turn to a problem that has a treatment similar to the one we used for the 
Strong Law of Large Numbers. Let (So = 0, S1, .52,...) be a random walk. For 
n=0,1,2,... let Rn = t{S,:0<m < n}; that is, Rn equals the cardinality of 
the image of the random walk through time n. Our goal is to study the behavior 
of R,/n as n > œ. We will prove that R,/n converges a.s. to the probability 
that the random walks never returns to 0. For this purpose we introduce a 
sequence (IJ,: n = 0,1,...) of indicator random variables: 


1 if n m fı 

(12.12) I,(w) = if S,(w) # Sm(w) form <n 
0 otherwise. 

Clearly, 


Theorem 28. Let (So = 0, S1, S2,...) be a random walk in RÊ, and, for 
n=0,1,2,..., set Rn =4{Sm:0<m<n}. Then, as n > œ, 


fa > P({w: Sm(w) #0 for m > 0}) as. 


PROOF. Let I, be defined by (12.12) and (Xn: n = 1,2,...) denote the step 
sequence of the random walk. Since the Xn are iid, the distribution of a random 
vector of the form (Xni, ---,Xng) depends only on d and not on the choice of 
the subscripts nı,..., na, provided only that these subscripts are distinct. This 
fact is used for the third equality below: 


SO Xe (w) = SX) for0<m<n}) 
k=1 k=1 


P(e: 5 X;,(w) #0 forO<m<n}) 
k=m-+1 


E(In) = P({ 


€ 


n-m 


aP u: > X;,(w) #0 for0<m<n}) 
k=1 
= P({w: Sn-m #0 for 0< m< n}) 


= P({w: Sm #0 for 0< m < n}) 

N P({w: Sm #0 for 0 < m}) 
as n 7 œ. In view of this calculation, (12.13), and the fact proved in part (iii) 
of Problem 10, we conclude that 


B(=) — P({w: Smlw) £0 for m > 0}). 


12.8. THE IMAGE OF A RANDOM WALK 207 


We next wish to apply Lemma 7. We first obtain a bound on Cov(Jn, Ip) for 
n < p: 


E(InIp) 
= P({w: Snlw) Sale) for m < n, Splw) # Salw) for q < p}) 
< P({w: Snlw) Æ Smlw) for m < n, Sp(w) Æ Sa(w) for q € [n, p)}) 
= P({w: Sw) £ Sm(w) for m < n}) 


q 


XO Xk(w) # J Xlo) for q € [n,p)}) 


k=n+1 k=n+1 
= P({w: Snlw) £ Sm(w) for m < n}) 
P({w: Sp_n(w) # Sp(w) for r < p—n}) 
= E(In)E(Ip-n)- 
Let bm = [c™] for fixed c > 1. Then, using (12.13) and the inequality 


just proved, along with the elementary facts that Var(J,) < 1 for all k and 
Cov(Jo i) = 0 for all k, we have 


oe) bm k-1 
> Var (Fa =| = ` ma Ye Varth) (In) +2 X bR Y Y Cov(Lj Ie) 
m=0 m=0 k=1 j=0 
0° œo bm k-1 
< X on 2A YOY OVE Es) — Eh). 
m=0 m=1 k=2 j=1 


The first term is finite. We showed earlier in the proof that n ~ E(I,) is a 
decreasing function, so we may apply the Fubini Theorem to bound the second 
term by 


oo k-1 k—1 
2 Y oR SY EU) - Els) 
k=2 j=1 {m 


: bm >k} i=k—j 


9 co k-1 k-1 
Sposa DD a E-B) 
k=2 j=1 1=k-J 
= 25 e YF E- EUn) 
t= 1 k=i+1 j=k-i 
2 «A/L 
< asdf k ? dk) HE) - El) 


=i L S C[EU:) - Eli )] 
i=1 


2E() 
~ 1 — c? 


208 12. THEOREMS OF A.S. CONVERGENCE 


An appeal to Lemma 7 gives the desired limit for Ry, /bm. An argument that 
mimics the one used in the proof of Lemma 12 completes the proof. O 


Remark 1. The preceding theorem and proof are valid for random walks in 
groups. 


* Problem 45. Find the place or places where the preceding proof breaks down if S 
is assumed to be a random walk in R* rather than in R?. Then decide whether 
the theorem itself is true in the R` -setting. 


Problem 46. For an arbitrary simple random walk in Z use the Strong Law of Large 
Numbers to calculate the almost sure limit of (R,/n). Then use Theorem 28 to 
calculate the probability that the random walk returns to 0 at least once. 


Problem 47. Modify the conclusions for the preceding exercise to encompass ran- 
dom walks in Z whose steps have absolute value 1 or 0. 


CHAPTER 13 
Characteristic Functions 


‘Characteristic functions’ and ‘moment generating functions’ correspond to dis- 
tributions on R and R”, respectively, in a manner analogous to the correspon- 
dence between probability generating functions and distributions on Z`. It will 
be seen here and in succeeding chapters that these tools are quite powerful, par- 
ticularly when independence is involved. After completing our coverage of the 
theory for the real line, we generalize characteristic functions to Rê and discuss 
normal distributions in that setting. At the end of the chapter, we apply the 
1-dimensional theory to random walk on Z. 


Remark 1. Unfortunately, terminology is not consistent in the literature. 
Many books on real analysis use the term ‘Fourier transform’ or ‘Fourier-Stieltjes 
transform’ for what we call a ‘characteristic function’, and use the term ‘charac- 
teristic function’ for what we have called an ‘indicator function’. And ‘moment 
generating functions’ are known elsewhere as ‘Laplace transforms’ or ‘Laplace- 
Stieltjes transforms’. Others use the term ‘moment generating function’ but with 
a sign difference from what is used here. 


13.1. Definition and basic examples 
We begin with 


Definition 1. The characteristic function of an R-valued random variable X 
is the C-valued function 


v ~ E(e”’*) = E(cos(vX)) +iE(sin(vX)), veR, 
where i denotes a complex number whose square is —1. 


Notice that for v € R, the function z ~ et”? is bounded and continuous (and 
thus measurable), so E(e*’*) exists and is finite. The characteristic function 
of a random variable is also called the characteristic function of its distribution 
and also of its distribution function. 


210 13. CHARACTERISTIC FUNCTIONS 


Problem 1. Calculate the characteristic functions of the Bernoulli, binomial, geo- 
metric, and Poisson distributions. 


Problem 2. Calculate the characteristic function of the constant random variable 
equal to c. 


Example 1. Let 
7 1 
F(a) = f ge 2 de 
-oo V 2T 
We will sketch a calculation of the characteristic function 8 of this normal dis- 
tribution function F; some details will be omitted. For v € R, 


1 S 2 
vy=—_— ett eT /2 dr. 
A) V 2T a 


We use the Dominated Convergence Theorem to differentiate with respect to v 
and then integration by parts to obtain a differential equation for 8: 


1 f TERE 2 1 ae 2 
‘(v) = —— ize”? e7” /? dy = -= | je de = 17 
Ate) V2 Joo A 2d 


1 OO 2 , 
SS —x* /2 ivz dr = — 
= [= e VE T = VOU). 
ae ee B(v) 


So, Ø is the unique solution of the differential equation §'(v) = —v8(v) satisfying 
G(0) = 1; thus, G(v) = exp(—v?/2). 


Problem 3. Supply the details for the preceding example by showing that 8'(v) 
does equal the expression obtained by differentiation inside the integral and also by 
commenting on the correctness of the manipulations involving C-valued functions. 


Problem 4. Calculate the characteristic function of a normally distributed random 
variable having arbitrary mean and variance. 


Problem 5. Calculate the characteristic function of the exponential distribution 
with support R* and mean 1/A, A > 0. 


Problem 6. [Bilateral exponential distributions] Let A > 0 and let Q denote the 
probability measure whose density with respect to Lebesgue measure is the function 
z ~ (A/2)e~>!!. Show that the characteristic function of Q is the function 
A2 
"O Care 


Problem 7. Let 8 denote the characteristic function of an R-valued random vari- 
able X, and let a and b be two real constants. Show that the characteristic function 
of the random variable aX + b is the function 


v ~ e’ B(av). 


13.2. THE PARSEVAL RELATION AND UNIQUENESS 211 


Problem 8. Calculate the characteristic function of the uniform distribution on the 
interval [—1,1]. As a check confirm directly that your answer is continuous. Then 
use the preceding proposition to calculate the characteristic function of a uniform 
distribution on an arbitrary interval. Write the characteristic function in a form 
that displays the fact that it is R-valued if and only if the support of the uniform 
distribution is [—a,a] for some a > 0. 


13.2. The Parseval Relation and uniqueness 


The following lemma is a useful tool. In this section, we will use it to prove that 
distinct distributions have distinct characteristic functions. 


Lemma 2. [Parseval Relation] Let Q and R be two probability measures on 
R, and denote their characteristic functions by B and y, respectively. Then 


J y(z ~ v) Q(dz) = J e-#Y B(y) R(dy) 


for each v E€ R. 


PROOF. The function (x,y) ~ e(*-”)¥ is bounded and continuous (and thus 
measurable). So its integral with respect to the product measure Q x R exists. By 
the Fubini Theorem this integral equals two different iterated integrals. These 
two iterated integrals are the integrals in the relation being proved. O 


There is an important special case of the Parseval Relation. Let R be the 
normal distribution with mean 0 and variance g? and use the formula found in 
Problem 4 for the characteristic function of the normal distribution to obtain: 


1 o 
(13.1) eee Q(dzxr) = DF | eT iVY ey" /20° B(y) dy. 
— co 


Theorem 3. If Qı and Q2 are probability measures on R with the same char- 
acteristic function, then Qı = Qo. 


PROOF. Since Q, and Q2 have the same characteristic function, (13.1) implies 


that 
| ee Qı (dz) = JER Q>(dz) 


for =œ < v < œ and ø > 0. Multiply both sides by o/V2z, integrate over 
the interval (—oo,a) with respect to Lebesgue measure, and apply the Fubini 
Theorem to obtain 


—o?(v—2)?/2 —o*(y—a)? /2 

oe oe 

7 = dgl) = f f —— dv Qa (dz 
Jl. V 2r (—00,a) V2 2(dz) 


for alla € R and g > 0. On both sides of this equation, the integrand of the 
inside integral is the density of the normal distribution with mean z and variance 
1/o7, so the inside integral equals the probability that a random variable with 
such a distribution lies in the interval (—00o,a). As ø — ov, this probability 


212 13. CHARACTERISTIC FUNCTIONS 


converges to 1 if z < a, and to 0 if z > a. It converges to 1/2 if x = a. It follows 
from the Bounded Convergence Theorem that 


Qi ((—00, a) + 5Q1({a}) = Q2((-00,4)) + 5Qr({a}) 


for alla € R. Since there can be at most countably many values of a such that 
either Qı ({a}) or Q2({a}) is nonzero, we may conclude that there is a dense set 
of real numbers a such that Qı(—2,a)) = Q2((—oo,a)). A straightforward ap- 
plication of the Uniqueness of Measure Theorem now implies that Qı = Q2. O 


characteristic 
Distribution support | density: z ~> function: v ~ 


i 
Exponential pe 


(222)? ifvu 40 
1 ifv=0 


TABLE 13.1. Characteristic functions of some continuous dis- 
tributions 


The characteristic functions of some distributions are given in Tables 13.1 
and 13.2. These are all worked out in examples and problems contained in this 


13.2. THE PARSEVAL RELATION AND UNIQUENESS 213 
aati © 


characteristic 
support density: x ~ function: v ~ 


| a 


Binomial 


ee B A 


Neg. Binomial 


TABLE 13.2. Characteristic functions of some discrete distribu- 
tions 


chapter. In view of Problem 7 only one representative distribution of any type 
is included. 


Example 2. We show here how to compute the characteristic function of 
a Cauchy distribution. Our method uses the Parseval Relation in a manner 
that is similar to that found in the proof of Theorem 3. Let Q be the Cauchy 
distribution with density a/r (a? +x°), where a is a positive parameter, and let 8 
be its characteristic function. Also, let R be the normal distribution with mean 
c and variance a”. The density of R is 


e—(y-¢)?/207 
w ą m 
4 V 270? 


By Example 1 and Problem 7, the characteristic function of R is 


214 13. CHARACTERISTIC FUNCTIONS 


By the Parseval Relation, with v = 0, 


ize ,—o* a7 /2 adx z l —(y—c)? /207 
(13.2) fe e x(a? + a2) Te a eer B(y)e dy . 


Now make the change of variables y = —z in the left side of (13.2). After 
some minor rearrangement of constants, the left side of (13.2) becomes 


2 ; a g? 2,2 
13.3 = | ety \/ = Y"/2 dy. 
es) T Je a? + y? Ino? dy 


We wish to apply the Parseval Relation again, this time with (13.3) playing 
the role of the right side of the Parseval Relation. The function a? /(a? +y?) is the 
characteristic function of the bilateral exponential distribution with parameter 
A = a (see Problem 6), and we let this function play the role of 3 in the right 
side of the Parseval Relation. The function 


o? —o?y?/2 

m 
is the density of the normal distribution with mean 0 and variance 1/07. This 
distribution plays the role of R as we apply the Parseval Relation to (13.3) and 


find that it equals 
2 2/52 ae elz] 
—(xz—c)*/20 
\/ ae Je Tagg dz. 


Substitute this expression into the left side of (13.2) and rearrange constants to 
obtain 


/ e e ee r fo Tank p 
e — dr = — 
V 2ra? ý V 270? : 
for all c € Rando > 0. Ina manner similar to the end of the proof of Theorem 3, 
we may now conclude that @(x) = e~*!*! for all z € R. 


Problem 9. Complete the details left at the end of the preceding example. Hint: 
Integrate in c from 0 to t, let a° go to 0, and apply the Fundamental Theorem of 
Calculus. 


Problem 10. (for those who know some complex variable theory) Use contour in- 
tegration to calculate the characteristic function of a Cauchy distribution. 


13.3. Characteristic functions of convolutions 


Now that we have a one-to-one correspondence between the set of probability 
measures on R and the set of characteristic functions, it is natural to inquire 
about the details of this correspondence with respect to operations that are 
important for distributions. The subject of the next theorem is the operation of 
convolution. 


13.3. CHARACTERISTIC FUNCTIONS OF CONVOLUTIONS 215 


Theorem 4. Let Q and R denote two probability measures on R. Then the 
characteristic function of Q » R equals By, where 8 and y are the characteristic 
functions of Q and R. 


PROOF. Let X and Y be independent random variables whose distributions 
are Q and R, respectively. Then the characteristic function a of Q * R satisfies 


a(v) = Beret?) = E(e”’* er) = E(e”’*) E(e””) = B(v)y(v) oO 


Example 3. Our goal is to calculate the characteristic function 6), of the 
negative binomial distribution Q,,- with parameters p and r. By using probabil- 
ity generating functions (see Theorem 5 and Problem 23, both of Chapter 10), 
we can easily show that 

Oy, * Qp,t = Qp,s+t 
for all p € (0,1) and all s,t > 0. It follows from Theorem 4 that, for all such 
P, 8, t, 


(13.4) Dp sUpit F3 bp,s+t . 


When r = 1, Qp,r is the geometric distribution with parameter p. The charac- 
teristic function for the geometric distribution is easily calculated directly from 
the definition: 


P=? 
Repeated applications of (13.4) would therefore seem to show that 
n m l/n 1 — p min 
(13.5) CCT = (cer wee ce a Ma Creasy, í 


for all positive integers m,n. 

This formula is indeed correct, provided we properly interpret the (m/n 
power of the C-valued function 8,1. This is most efficiently done in terms of 
the complex logarithm. As indicated in Problem 7 of Appendix E, for any 
continuous function f: R — C \ {0} with f(0) = 1, there is a unique continuous 
function à: R —> C such that A(0) = 0 and eò) = f(v) for all real v. (We 
think of A as the complex logarithm of f, but we cannot, at first, simply define 
A = logo f because the complex logarithm is multivalued.) Applying this result 
to the functions ĝp,r, we let Yp, be the unique continuous C-valued function 
defined on R such that Yp,- (0) = 0 and 


Bpr(v) = ever) 


ye 


for all real v. It follows from the uniqueness of w,,, and (13.4) that 


Wp, s + Wp,t = Wp, s+t 


for all relevant p,s,t. In particular, 


m 
Up (m/n) T mE 


216 13. CHARACTERISTIC FUNCTIONS 


so that 
Bp (m/n) = EPRE, 
Note that this function is continuous, takes the value 1 at 0, and satisfies the 


relationship 
(Bp,(m/n))” = (Bp). 


That no other function can have these properties follows from the facts that we 
have mentioned concerning the uniqueness of the continuous complex logarithm. 

We have now given a precise and correct interpretation of (13.5), thereby 
providing a formula for p,- for all positive rational r: 


(13.6) Bpr (w) = er tpa l). 


We claim that this formula is correct for all r > 0. The expression on the 
right is certainly meaningful for all such r, and since the exponential function 
is continuous, it is also continuous as a function of r, for fixed p and v. By 
the Dominated Convergence Theorem and the definition of the characteristic 
function, the function r ~> 8p,r (v) is also continuous for fixed p and v. It follows 
that (13.6) is correct for all r > 0, as claimed. For simplicity, the function on 
the right of (13.6) is usually written in the less precise form found in Table 13.2. 


Problem 11. Give a rigorous interpretation of the formula in Table 13.1 for the 
characteristic function of the gamma distribution. Use arguments similar to those 
given in the preceding example to verify that your interpretation is correct. 


Problem 12. Use characteristic functions to prove that the sum of independent 
Gaussian random variables is Gaussian with mean and variance equal to the re- 
spective sums of the means and variances of the summands. 


Problem 13. Prove the following identity: 


Hint: The right side is the characteristic function of a convergent series of indepen- 
dent random variables. 


Problem 14. [Triangular distributions] For real constants a > 0 and b, let 


a—a’|x—b| if |z—b| <1/a 
f(z) = 
0 if |r — b| > 1/a. 


Show that f is the density with respect to Lebesgue measure of a probability 
measure. For the case a = b = 1 calculate the characteristic function of this 
measure in two ways: directly and by using Problem 7 of Chapter 10 in conjunction 
with Problem 8 and Theorem 4 of this chapter. Then use Problem 7 to calculate 
the characteristic function for arbitrary a and b, writing the characteristic function 
in a form that displays the fact that it is R-valued if and only if b = 0. 


13.4. SYMMETRIZATION 217 


* Problem 15. Use characteristic functions to decide when the sum of two indepen- 
dent random variables of uniform type is triangularly distributed (as described in 
the preceding problem). For instance, is it necessary that the two independent 
random variables have the same distribution? Is it sufficient? 


13.4. Symmetrization 


A common tool in probability theory is that of ‘symmetrization’. If R is the 
distribution of an R-valued random variable X and Q is the distribution of 
— X, then the symmetrization of R is R » Q. Thus the symmetrization of R is 
the distribution of the difference of two independent random variables, each of 
which has distribution R. The following exercises introduce some basic facts 
about symmetrization. 


Problem 16. Let 8 denote the characteristic function of a random variable X. 
Show that the characteristic function of —X is 3, the complex conjugate of 2. 


Problem 17. Let X and Y be independent random variables with common char- 
acteristic function 8. Show that the characteristic function of X — Y is |@|?. 


Problem 18. Use characteristic functions to show that the bilateral exponential 
with parameter A is the symmetrization of the exponential with the same param- 
eter. 


Problem 19. [Bilateral geometric distributions] Calculate the density (with respect 
to counting measure) and the characteristic function of the symmetrization of a 
geometric distribution. 


Problem 20. Prove that a random variable X is symmetric about 0 if and only if 
its characteristic function § is R-valued, in which case 3(v) = E[cos(vX)],v E€ R. 


Problem 21. (Z1, Z2,...) be an independent sequence of R-valued random vari- 
ables, each of which is symmetric about 0. Suppose there is a real constant c 
such that |Z,| < c for all n. The Kolmogorov Three-Series Theorem implies that 
>, Zn converges a.s. in R if and only if }> E(Z2) < oo. Use characteristic 
functions to prove the ‘only if’ part of this result. (This approach is an alter- 
native to the use of Corollary 5 of Chapter 5 in the proof of the Kolmogorov 
Three-Series Theorem.) Hint: For all v with |v| sufficiently small (depending on c), 
cos(vZn) < (1 — įv” Z3). Thus, for such v, the characteristic function of $`, Zn is 
bounded above by |], [1 — Hv’ E(Z;)). Use Problem 1 of Appendix E. 


Problem 22. Find a distribution on Z that is symmetric about 0, but which is not 
the symmetrization of any distribution. 


218 13. CHARACTERISTIC FUNCTIONS 


13.5. Moment generating functions 


Probability generating functions and characteristic functions of distributions are 
called transforms of those distributions. We now introduce a third transform, 
defined for R' -valued random variables X: plu) = E(e~**) for 0 < u < œ. 
It is called the moment generating function of X or of its distribution or of its 
distribution function. Just as we defined 1% = 1 in the definition of probability 
generating function, we define e77% = 1 in the definition of moment generating 
function. 

The relationship between the probability generating function and the moment 
generating function is contained in the substitution s = e7”. Note also that since 
the power series that defines a probability generating function converges for any 
complex number s such that |s| < 1, the substitution s = e’” determines a 
similar relationship between probability generating functions and characteristic 
functions. This fact implies that there is also a substitution that relates moment 
generating functions to characteristic functions (see Problem 24). 

Any of the three transforms—characteristic functions, moment generating 
functions, probability generating functions—can be used for a random variable 
supported by Zt. Often a good strategy is to use probability generating func- 
tions for distributions supported by Z* (or Z ), moment generating functions 
for distributions supported by Rt (or R”) but not by TA and characteristic 
functions for other distributions on R. 


Problem 23. Prove that the moment generating function of an R*-valued random 
variable is continuous on [0,00) and that the moment generating function of an 
R’ -valued random variable is continuous on (0, 00). 


Problem 24. Find a change of variables that relates characteristic functions to 
moment generating functions of nonnegative finite random variables. Include ap- 
propriate comments about extending the domains of definition of these two types 
of transforms. 


Theorem 5. Let Qı and Qz be two probability measures on R`. Then the 
moment generating function of Qı * Q2 equals pıp2, where pı and p2 are the 
moment generating functions of Qı and Q2, respectively. 


Problem 25. Prove the preceding theorem. 


Theorem 6. Different probability measures on [0,00] have different moment 
generating functions. 


PROOF. Let f: R* — (0,1] be the function z ~ e~*. This function sets up a 
one-to-one correspondence between Rt and (0, 1] which is continuous, and hence 


13.5. MOMENT GENERATING FUNCTIONS 219 


measurable, in both directions. Let Q; and Q2 be two probability measures on 
Rt, and let R;,i = 1,2, be the corresponding distributions induced by f on 
(0, 1]. Clearly, Qı = Qe if and only if Rı = R2. We will show that if Qı and Q2 
have the same moment generating function y, then Rı and Rz have the same 
characteristic function. The result then follows immediately from Theorem 3. 

By Theorem 15 of Chapter 4 (with the role of X in that result being played 
by z~ e~*), 


J (e~*)" Q (dz) =) x” Ri(dr), n=0,1,2,.... 
(0,00) (0.1) 


The left side of this equation equals y(n) and the right side is the nt moment 
of Ry. 

By expanding the function eî”? in a power series and applying the Bounded 
Convergence Theorem, we can express the characteristic function of R; in terms 
of its moments, and hence in terms of y: 


ivr — a (iv)” n — = (iv)"p(n) 
In? pag) =o n! hou? sey) = n! 


n=0 


The characteristic function of Rə is similarly determined by y, so that Rı and 
Rə have the same characteristic function, as desired. O 


The preceding proof actually shows more than is asserted in the theorem. 


Corollary 7. Let Q,Q2 be probability measures on R™ with moment gener- 
ating functions y1, ~2. If yi(n) = ye(n) for all n =1,2,..., then Qı = Qo. 


Remark 2. There are two alternative proofs of Theorem 6 that may be of 
interest. 

The first relies on complex variable theory to show that the moment gen- 
erating function uniquely determines the characteristic function. Let Q be a 
distribution on Rt with moment generating function y and characteristic func- 
tion 3. A standard technique (‘analytic continuation’) can be used to extend y 
uniquely to a holomorphic function in the interior of the right half of the complex 
plane, and then to a continuous function on the entire right half of the complex 
plane, including the imaginary axis. The theory implies that when this is done, 
y(—iv) = (v) for all v € R, as might be expected from the formulas for y and 
B. Tables 13.1, 13.2, and 13.3 bear out this relationship. 

The second alternative proof avoids the use of complex numbers and charac- 
teristic functions. Let f be any bounded continuous function with domain [0, oo). 
The Stone-Weierstrass Theorem implies that there is a uniformly bounded se- 
quence (fn: n = 1,2,...) of functions of the form falz) = aye7™1*7 + --- + 
ane “"™, ay,...,@n E R and u1,..., Un E Rt, such that fan converges to f uni- 
formly on compact sets as n —> œ. Since f fn(x)Q(dr) = aiy(ui1) +--+ 
anp(un), the Bounded Convergence Theorem implies that f f(x) Q(dz) is deter- 
mined by y. Since f is an arbitrary bounded continuous function on [0, œo), it is 


220 13. CHARACTERISTIC FUNCTIONS 


not hard to conclude that Q itself is determined by y. (There is a similar proof 
of Theorem 3, based on (13.1).) 


Problem 26. Verify the entries in the last column of Table 13.3. 


Problem 27. Use part of Table 13.3 to give an alternative solution of Problem 9 
of Chapter 10. 


Problem 28. [The Moment Problem] Let R and Q be probability distributions on 
R, each with bounded support. Show that if R and Q have the same n*® moments 
for n = 0,1,2,..., then R = Q. Hint: See the proof of Theorem 6. (Note: The 
hypothesis of bounded support can be weakened but not wholly eliminated.) 


Problem 29. [Yule-Furry distributions] For k = 1,2,...,m and x > 0, let F; (£) = 
ke-**. Use moment generating functions to show that the density on [0, 00) of 
Fi * Fo *---* Fm is 

m—1 


r~ me “(l-e ”) 


Hint: Use partial fractions. 


* Problem 30. Use the preceding problem to calculate the mean and variance of the 
Yule-Furry distributions. 


The following proposition gives a formula for the moment generating func- 
tion of the absolute value of an R-valued random variable X in terms of the 
characteristic function of X. 


Proposition 8. Let Q be a probability distribution on (R, B), and let 8 denote 
its characteristic function. Then, for u > 0, 


[el Qda) = f a dy . 


Problem 31. Prove the preceding proposition. Hint: Apply the Parseval Relation, 
with v = 0 and R equaling a Cauchy distribution. 


The standard normal distribution function arises so often that it has been 
studied as a function in much the same way that, say, the sine and exponential 
functions have been studied. In particular, tables of approximate values for it 
have been constructed. And some calculators give approximate values for it just 
as they give approximate values for the exponential function. Actually the values 
given may be for the distribution function of the absolute value of a normally 
distributed random variable having mean 0 and variance 1. This function, called 
the error function, is denoted by 


2 T 
erf(x) = f= f e7% dw, x eR". 
T Jo 


Distribution 


Exponential 


Delta 


Binomial 


Poisson 


13.5. MOMENT GENERATING FUNCTIONS 


moment generating 
support density: x ~~ function: u ~ 


TABLE 13.3. Moment generating functions of some distribu- 


tions 


221 


222 13. CHARACTERISTIC FUNCTIONS 


Problem 32. Prove that 


a : enw” dy = z” [1 — erf(1/2a8) | 


B+y? 


for a, 2 > 0. 


Problem 33. Use Proposition 8 and the preceding problem to show that the mo- 
ment generating function of the absolute value of a normally distributed random 
variable X with mean 0 and variance a” is the function 


u~ geak —erf(cu)]. 


Show that the derivative from the right at 0 of this function equals —o0,/2/7. 


It is useful to become familiar with the functions 


°° sinw 
si(z)=- | dw, x>0, 
> Ù 


and 


ci(x) =- 5T ww, x >00, 


known as the sine integral and cosine integral, respectively; these integrals are 
improper Riemann integrals. As measure-theoretic integrals they do not exist. 


* Problem 34. Show that 
i Bate e`% dy = > (ci(au) sin(au) — si(au) cos(au)) 
a. UP ae ye u 


for a,u > 0. 


Problem 35. Let t > 0. Use the preceding problem to show that the moment 
generating function of the absolute value of a Cauchy random variable X with 


density 
t 


a(t? + 27) 


£ ~ 
is the function 


ifu =0 


1 
oan a a sin(tu) — si(tu) cos(tu)) if u > 0. 


Problem 36. Use the preceding problem and Problem 23 to obtain the following 
formula for an important improper Riemann integral: 


°° sing T 
ð x 


Problem 37. Show that the derivative from the right at 0 of the function (13.7) 


equals —oo. 


13.6. MOMENT THEOREMS 223 


13.6. Moment theorems 


In Chapter 5, we found a relationship between the moments of a distribution Q 
on Zt and the derivatives of the probability generating function of Q. There are 
analogous relationships involving moment generating functions and characteristic 
functions. 

Let Q be a distribution on R, and assume that the nt? moment of Q exists 
and is finite for some fixed positive integer n. For each k = 0,...,n, consider 
the function fk: R — C defined by 


An easy argument based on the Dominated Convergence Theorem shows that 
each fp is continuous on R. We wish to calculate f,(v) for k <n-—1: 
lim fk(v +h) — fr(v) gk ett? a 


h-0 h =- te h 


Q(dz) 
(13.8) 


ihz 


= J zte? lim ———— Q(dz) = i fe+1 (v). 


Moving the limit inside the integral is justified by the Dominated Convergence 
Theorem, the assumption that Q has finite nt? moment, and the fact that for 
all h £0, 


oF (cos(hxz) — 1) + isin(hz) 
h 


k „ivr 2, 


; < 2|gf+1]| : 


eiT = E 


Let 8 be the characteristic function of Q. Note that 8 = fo. A simple 
inductive argument based on (13.8) shows that for k = 0,...,n 


BP (v) = i” felv) 


so the derivatives of 8 exist and are finite up to order n. Setting v = 0andk =n 
in this equation gives the following result: 


7 


Proposition 9. Let X be an R-valued random variable with characteristic 
function 3. If the n moment of X exists and is finite for some nonnegative 
integer n, then the n™ derivative of B exists, and 


E(X”) = (-1)"8™ (0). 
The following problem contains the analogous result for moment generating 
functions: 
Problem 38. Let X be an R*-valued random variable with moment generating 


function y. Prove: For all n, the n*? derivative of y exists on (0, oo), and 


p™ (u) = (-1)"E(X"e""*), ou > 0. 


224 13. CHARACTERISTIC FUNCTIONS 


And if E(X”) is finite, then the n* derivative of y exists at 0 (taken from the 
right), and 
E(X”) = (-1)"p™ (0). 


The results we have obtained so far for the nt? moment of a random variable 
are not as useful as we would like, since they require knowing in advance that 
the n'è moment is finite. For the case of moment generating functions, this 
deficiency is remedied by the following theorem: 


Theorem 10. Let X be an R* -valued random variable with moment gener- 
ating function p. For each positive integer n, E(X”) < œ if and only if p™ (0) 
exists as a finite number, in which case 


E(X”) = (—1)"p™ (0), 
where the derivatives at 0 are taken from the right. 


PROOF. The ‘only if? part of the result is contained in Problem 38. We also 
know from Problem 38 that 


(13.9) p™ (u) = (-1)" E(X"e“*) 


for u > 0. To prove the ‘if’? part, we need to extend this formula to allow u = 0 
when vy”) (0) exists and is finite (taken from the right). 

Let us use induction on n. The assertion in the theorem makes sense and is 
obviously true when n = 0, so we can start our induction there. Now assume 
that the result is true when n = k for some integer k > 0. This assumption 
together with (13.9) implies that 


Œ) (0) — o) (u) 
E E (9) = (<1) im a JOE ew) 
(D*+ +D (0) = (—1)* lim > 


= lim E (x e) | 
uN,0 u 

By the l'Hospital Rule, [1 — e~“*]/u — z as u N 0. Thus, we can complete 
the proof by an appeal to the Monotone Convergence Theorem provided that 
we can show [1 — e~“*]/u increases as u N 0. This desired monotonicity is a 
consequence of the fact that the derivative of [1 — e~“*]/u with respect to u, 
namely 

e “711+ uz] — 1 

u? 

is negative for x > 0, a fact that can be seen in either of two ways. One way is 
to observe that, by the Taylor Formula, 1 +a < e° for a > 0. The other is to 
note that the function a ~ e~°(1 +a) —1 is 0 at 0 and has negative derivative 
fora > 0. O 


um ) 


13.6. MOMENT THEOREMS 225 


Problem 39. Use the preceding theorem together with appropriate calculations as 
a check on the correctness of —o,/2/z in Problem 33 and —oo in Problem 37. 


Problem 40. Let X = (Xn: n = 0,1,2,...) be an iid sequence of R’ -valued ran- 
dom variables with common moment generating function y, and let N be an Zz: 
valued random variable with probability generating function p. Assume that N 
is independent of X. Show that the moment generating function of the random 
variable S = X,+--:+Xvw is poy, and use this fact to find formulas for E(.S) and 
Var(S) in terms of the means and variances (or second moments) of X; and N. 


Unfortunately, the story for characteristic functions is more complicated than 
the one for moment generating functions. For example, it is possible that the 
first derivative at 0 of the characteristic function be finite even though the first 
moment does not exist (see Problem 42). 


Theorem 11. Let 8 be the characteristic function of an R-valued random 
variable X, and let n denote a positive integer. If E(X”) exists as a finite 
number, then B'™ exists as a C-valued function on R and 


E(X”) = (—i)” B™ (0). 


If n is even and 8 (0) exists as a finite number, then E(X”) exists as a finite 
number. 


PRooF. The first part of this result is contained in Proposition 9. It remains 
to show that if n is even and 6”) (0) is finite, then E(X”) is finite. 

We use induction on n, beginning, as in the proof of Theorem 10, at n = 0, 
for which the assertion is obviously true. Suppose the result is true for n = k— 2 
for some even integer k > 2, and suppose that 8‘*)(0) exists as a finite number. 
The inductive hypothesis implies that E(X*~?) is finite. We want to show that 
E(X*) is also finite. 


By the l’Hospital Rule 
. 2 —2cosvr 2 
ii == a 
v0 Vv 


Thus, the Fatou Lemma applies to give 


E(X*) < lim pf B( xt 222 GNe | 
v0 


v 
iv _ —ivX 
= lim inf (-z (eee) 
v0 v 
(13.10) sim pe ( -A Pia O t ha) 
v0 ye 
where fo, fi, fo,..., are the functions defined in the discussion leading up to 


Proposition 9. Since we are given that E(X*~?) is finite, we know from that 
discussion that @(*-?)(v) = i*-2 f,_2(v) for all real v. Our assumption that 


226 13. CHARACTERISTIC FUNCTIONS 


G\*) (0) exists and is finite thus implies that fi, (v) exists for v in a neighborhood 
of the origin, and that f;’_,(0) also exists and is finite. 

These facts allow us to apply the Hospital Rule to (13.10), showing in the 
process that the limit infimum there is actually a limit: 


lim inf (- fot) ht ht 
v0 v2 
=m Leal) = fhal-0) 
v—0 Jy 
v—0 Jy 
= z-2(0) ’ 


the last step following from the definition of f;’ ,(0). After substituting this 
expression into (13.10), we may conclude that E(X*) is finite, as desired. 0 


Problem 41. Near the end of the preceding proof, why was the application of the 
Hospital Rule followed by a reference to the definition of the derivative, rather 
than by a second application of the l’ Hospital Rule? 


Problem 42. Let g denote a decreasing function on the interval [1, 00) such that 
g(x) — 0 as x + œ. Let Q denote the probability measure whose density with 
respect to Lebesgue measure is the function 

cx~*g(|x|) if |x| >1 

L ~ 

0 if |x| <1, 
where c is the appropriate constant. (i) Show that §’(0) = 0, where 3 denotes the 
characteristic function of Q. (ii) Give an example of a g for which the distribution 


Q does not have a mean. Hint: For part (i) make a change of variables before using 
a convergence theorem. 


Problem 43. Let X = (Xn: n=0,1,2,...) be an iid sequence of R-valued random 
variables, with common characteristic function 3. Let N be a Z*-valued random 
variable with probability generating function p. Assume that N is independent of 
X. Show that Xi +---+ Xy has characteristic function po 3. Can you justify 
formulas for E(S) and Var(S) that are similar to those obtained in Problem 40? 


13.7. Inversion theorems 


Let us start with a simple general result about characteristic functions: 


Proposition 12. Let 8 be the characteristic function of a distribution on R. 
Then (i) B(0) = 1; (ii) B is continuous; and (iii) for all n = 1,2,..., and all 


13.7. INVERSION THEOREMS 227 


complex n-tuples (z1,..-,2n) and real n-tuples (v1,...,Un), 


(13.11) 3 S Blue — v3 )2;2%% 2 0. 


k=1 j=1 


PROOF. Property (i) is obvious, and Property (ii) follows easily from the 
definition of 8 and the Bounded Convergence Theorem. We now prove Property 
(iii). Let Q be the distribution corresponding to 8. By the definition of 8 and 
the linearity of the integral, the left side of (13.11) equals 


J Oo ar ziža) Q (dz) = J yer zj 
j=l 


(We have used some familiar facts from complex number theory, namely that 
|z|? = zz for any complex number z, and that e~** = et? for all real z.) O 


2 
Q(dz) > 0. 


Any C-valued function 8 on R that satisfies Property (iii) is positive definite. 
If the inequality at (13.11) is strict whenever (z1,...,2n) is not the zero vector, 
B is strictly positive definite. 

It turns out that the converse of Proposition 12 is also true: If 8: R > C 
is continuous and positive definite, and if 3(0) = 1, then @ is the characteristic 
function of some distribution on R. This converse will be proved in Chapter 14. 
In the present section, we will prove two important special cases of the converse, 
and for those special cases, provide formulas for calculating the distribution cor- 
responding to @. In particular, we will treat all distributions that have densities 
with respect to counting measure on Z, and some of the distributions that have 
densities with respect to Lebesgue measure on R. 

Let X be a Z-valued random variable, with characteristic function 3. Then 


B(w + 27) = B(e tren) zZ Elere) = Ele) = Blv) , 


Thus, the characteristic function of a distribution on Z is periodic with period 
27, in addition to having the three properties in Proposition 12. The converse 
of this result is also true, as stated in the following theorem, which also contains 
the first of our ‘inversion formulas’. 


Theorem 13. A function 8: R — C is the characteristic function of a Z- 
valued random variable if and only if it satisfies Properties (i)-(iii) of Propo- 
sition 12 and, in addition, is periodic with period 27. Furthermore, if B is the 
characteristic function of a Z-valued random variable X, then 


(13.12) a CE E eT I " 6-H” B(y) dv 


gr Di 


for each integer z. 


228 13. CHARACTERISTIC FUNCTIONS 


PROOF. We have already seen that any characteristic function of a distribu- 
tion on Z satisfies Properties (i)-(iii) and is periodic with period 27. So now we 
consider a function 8: R — C which has these four properties. We wish to show 
that 8 is the characteristic function of the distribution on Z whose density with 
respect to counting measure on Z is given by 

1 T 


(13.13) Dz = — 
27 J 


e'*" Blv)dv, TEZ. 
We must show three things: that each p,, defined by (13.13), is nonnegative; 
that } ` czPz = 1; and that the characteristic function of the distribution on Z 
thus defined is the function £8. 

To prove that each pz is nonnegative, we first use periodicity to conclude that, 


for each u: i 2 
23.0 —iz(v—u) = 
Ps 5, ae B(v —u) dv. 
Thus, 
=< f Beis 1 [ T —ie(u—u) B( )du| d 
Pe = 57 gee "= (np T m v — u)dv| du. 


This iterated integral can be written as the limit of Riemann sums, each of 
which takes the form appearing in (13.11) (with the z; equal to e**” for various 
values of v in [0,27]). Since 2 is positive definite, each of the Riemann sums is 
nonnegative. We conclude that each p; is nonnegative. 

We now prove, simultaneously, that the quantities py sum to 1 and that the 
corresponding distribution has characteristic function @. In order to do this, we 
consider, for each positive integer m and each u € R, the quantity 


(13.14) bm (u) = ` (1- EI) et py. 


t=- m 


Suppose we can show that 
(13.15) bn(u) > Llu), uER. 


Then it would follow from the Monotone Convergence Theorem that 
clint PS Pe 


Since 6(0) = 1, we would have that ` ps = 1. The finiteness of ` p, would 
then allow us to use the Dominated Convergence Theorem to justify taking the 
limit as m — oo inside the summation in (13.14) to obtain 


Oo 
È ze iur 
a ta Doon Re 
— oo 


for all u € R. The desired conclusion that 8 is the characteristic function of the 
distribution on Z corresponding to the quantities p, would then follow. 


13.7. INVERSION THEOREMS 229 


Thus it remains to prove (13.15). Substitute the expression (13.13) into the 
definition of b,,(u) and then change variables to obtain 


bm(u) S i | 3 (1- EI) re Blo) dv 


t=—-m 


= 1 mo m —iv = 3 Ş jd ->f 3 EN SwE +u)d 
o meek )* B(u + u) du an J, 24 m e v v. 
We used the periodicity of both 8 and the function v ~ e~*’* to adjust the 
limits of integration after the change of variables. The first sum on the right 
of the preceding chain of equalities can be treated as a pair of finite geometric 
series, and the second term can be treated as the derivative of a pair of geometric 
series. Thus, the two sums can be calculated explicitly, and the result is 


(13.16) 
T _ 1 T gj Be 1 

e a et 

27r Jor 2msin“(v/2) 2x J_, msin(v/2) 
The integrands in (13.16) are to be understood as being defined at v = 0 so as 
to be continuous there. (Why?) 

As m — oo, the second integral in the right side of (13.16) approaches 0 be- 
cause its integrand approaches 0 and the Bounded Convergence Theorem applies 
by virtue of the inequalities 


Bu +u) dv. 


3 2 
(13.17) msia =| > m([5|~| Zl) = mlll- 3) 


and 
|sin(m + 3)o] < (m+ $), 

both of which are consequences of the Taylor Formula with remainder. 

The integrand in the first integral in the right side of (13.16) also approaches 
0, but this time neither the Bounded nor Dominated Convergence Theorems 
is applicable; and as we will see, the limit of the integral is not 0. We make 
the substitution v = w/(m + 1) in the first integral in the right side of (13.16), 
obtaining 


(13.18) 1 o (1 — cos w) B([w/(m + 1)] + u) Pa 


4r Jonem) m(m + 1)sin?(w/2(m + 1)) 
Since 8 is continuous, we see that the integrand approaches 48(u)(1 — cos u) /u? 


as m — oo, so that, were we able to move the limit inside the integral sign and 
into the endpoints of integration, we would obtain 


bm(u) > B S = aw, 


2 
TJ w 


which, by the forthcoming Problem 44, equals (u), as desired. To handle the 
endpoints of integration, we simply let the interval of integration be (—0o, 00) 


230 13. CHARACTERISTIC FUNCTIONS 


and insert the indicator function of the interval [—a7(m + 1), n(m + 1)] into the 
integrand. We now wish to apply the Dominated Convergence Theorem. The 
function ĝ is continuous and periodic, so it is bounded. The indicator of the 
interval [—7(m+1), 7(m+1)] is also bounded, so it is sufficient to find a function 
with finite integral that dominates 


(1 — cos w) 
m(m + 1) sin?(w/2(m + 1)) ` 


An argument similar to that at (13.17) shows that m(m + 1) sin?(w/2(m + 1)) 
is bounded below by a constant multiple of w*, so some constant multiple of 
(1 — cosw)/w? will serve as our dominating function, since we have already 
noted that this function has finite integral with respect to Lebesgue measure on 
R O 


Remark 3. In applications of Theorem 13, the most difficult hypothesis to 
check is often the positive definiteness. But the proof shows that this condition 
is only needed to prove that the quantities p, are nonnegative. In many cases of 
interest, it is possible to compute these quantities explicitly from (13.13), thus 
making the verification of positive definiteness unnecessary. 


Problem 44. Prove the following equality, which was used in the preceding proof: 


J Het au =2 | OY ites 
-00 E 0 u 


Hint: For a > 0 let 
Oo p — au 
e7°"(1 — cos u) 
f(a) = J — 
5 u 
and use a convergence theorem to prove that f is continuous. Also, use a conver- 
gence theorem to prove that, for a > 0, 


f'(a) = f e7™®" (1 — cos u) du = saat iy 


Show that f(a) > 0 and f'(a) > 0 as a > oo and use these facts in conjunction 
with two successive calculations of antiderivatives to obtain a formula for f(a), 
a > 0. Then let a N 0 to calculate f(0). 


Problem 45. Suppose, in the proof of Theorem 13, that m had not been intro- 
duced, but that instead, the integral formula for pz had been inserted immediately 
with the idea of then, at the very next step, using the Fubini Theorem to inter- 
change the order of summation and integration. Would that procedure work? 


Problem 46. In the proof of Theorem 13 it would seem more natural to introduce 
the indicator function of the set of integers having absolute value no larger than 
m rather than the factor (1 — |x|/m) V 0. However, some difficulties arise that did 
not arise in the proof as given. Explore this issue. 


13.7. INVERSION THEOREMS 231 


Problem 47. Use Theorem 13 to show that for all integers n, the function v ~ 
cos nv is the characteristic function of a distribution on Z, and use the inversion 
formula contained there to calculate the corresponding density. Hint: See the re- 
mark following the proof of the theorem. 


* Problem 48. Use Theorem 13 and the symmetrization of a geometric distribution 
(see Problem 19) to obtain a formula for the integral 


Tv 
cos nv 
/ o he 
a — bcosv 
-r 
where n is an integer and 0 <b <a. 


Problem 49. State and prove the analogue of Theorem 13 for distributions sup- 
ported by the set {ax: x € Z}, where a is a fixed positive number. 


Problem 50. Let 8 be the characteristic function of an R-valued random variable 
X. Show that if G(27a) = 1 for some a Æ 0, then aX is almost surely Z-valued. 
Also show that if |G(27a)| = 1 for some a Æ 0, then there exists a real constant 
b such that aX + b is Z-valued. Hint: If G(27a) = 1, then E(cos(27aX)) = 1. If 
|G(2na)| = 1, then e*°G(27a) = 1 for some real number c. 


Problem 51. Let 6 be the characteristic function of an R-valued random variable 
X. Show that if |@(v)| = 1 for all v in some nonempty open interval, then X is 
almost surely constant. Hint: See the previous problem. 


We wish also to obtain an analogue of Theorem 13 for distributions that have 
densities with respect to Lebesgue measure. It turns out that there are no simple 
necessary and sufficient conditions that characterize all such distributions. The 
following result provides a set of sufficient conditions. It also contains a useful 
inversion formula. 


Theorem 14. Let 8: R > C be a function that satisfies Properties (i)- (iit) 
of Proposition 12. Suppose, in addition, that 


f eola Os 


Then 8 is the characteristic function of a distribution Q on R that is absolutely 
continuous with respect to Lebesgue measure. Furthermore, Q has a continuous 
density f, given by the formula 


(13.19) ee = A T e2 B(y) do. 


PARTIAL PROOF. Let f be the function defined in (13.19). Because of our 
assumption about 8, f is clearly well-defined and finite. The Dominated Con- 
vergence Theorem (with |8| as the dominating function) implies that f is con- 
tinuous. Thus, it remains to prove three facts: that f is nonnegative, that the 
integral of f with respect to Lebesgue measure equals 1, and that the distribu- 
tion with density f has characteristic function 8. The proof of the second and 


232 13. CHARACTERISTIC FUNCTIONS 


third facts is similar to (but simpler than) the analogous part of the proof of 
Theorem 13 and is left as an exercise for the reader. We prove here that f is 
nonnegative. 

First note that a simple change of variables implies that for all u € R, 


D e~ tT B(v) dv = a ee W)2 B(y — u) dv. 


— OO — CO 


Thus, for all positive numbers A, 


ar f(x) = af. [e (vue B(y — u) dv du 


(v—u)z EA 
-Af jie B(v — u) dv du 


A 
+ zl | e“) B(y — u) dudu. 
2A Jia Jj-A,A] \ ) 


The first iterated integral on the right side of this expression is nonnegative for 
all A by the nonnegative definiteness of 8 (see the corresponding part of the 
argument in the proof of Theorem 13). To complete the proof, it is sufficient to 
show that the second iterated integral converges to 0 as A —> ov. 

This second iterated integral breaks naturally into two pieces, one of which is 


1 A Ooo : 
oh J J e~u) B(y — u) dudu. 
-AJA 


We will show that this piece converges to 0 as A > oo. The proof for the other 


piece is similar. 
a. | |G(v — u)| dv du 


af a e(¥—-4)2 B(y — u) du du 
= D f Boldwdu 


= — dud 
af I: oua Bn) eae 
1 OO 
= — 2 
sa fw A@A))|9(w)| du 
fe. @) 
= | (ADW dw. 
The integrand in the last expression converges to 0 as A —> ov, and it is dom- 


inated by |8], so the integral converges to 0 as A —> co by the Dominated 
Convergence Theorem. O 


13.7. INVERSION THEOREMS 233 


Problem 52. Complete the proof of the preceding theorem by showing that the 
integral of f with respect to Lebesgue measure equals 1 and that the distribution 
with density f has characteristic function @. Hint: Instead of summing from —m to 
m, as was done in the proof of Theorem 13, you should integrate. You will obtain 
an expression similar to but simpler than the one in (13.16). 


Problem 53. The inversion formula in Theorem 14 is valid for computing some, 
but not all, of the densities in Table 13.1. Which ones are to be excluded? What 
do these densities have in common? Does the beta distribution fit into the category 
of distributions covered by Theorem 14? Why or why not? 


Problem 54. [Normal approximation to binomial] Let X = (Xi, X2,...) be an iid 
sequence of Bernoulli random variables with parameter p = 1/2, and let Z be a 
normally distribution random variable with mean 0 and variance 1. Assume that 
Z is independent of the sequence X. Fix ô > 0, and let Bn be the characteristic 


function of 
2(Xi +- +Xn)-n 


Ja 
Calculate @, explicitly, and prove that the corresponding distribution has a density 
with respect to Lebesgue measure on R. Then show that for all v € R, 


2 2 
Ún 3 iy (1+6~)/2 , 
iag (v) i 


6Z + 


Use this fact in conjunction with the inversion formula in Theorem 14 to show that 
for all real numbers a < b, 


2(X1(w) +: + Xn(w))— 1 a 


lim P({wia < Z(w) + Jn = }) 


mn —> o0 


1 b 
4 /2n(1 + ô?) | Í 


Conclude that 


; n +ayn k n + byn 1 a 279 
l P >< < ——_+— Sao z : 
im ({w 7 < > Xrk(w) < 5 }) al e dz 


—z? /2(1+657) ay 


n= OO 


Problem 55. [Normal approximation to gamma] Let Q, be the gamma distribution 
on [0, c0) with parameters 1,7. Show that for all real a < b, 


b 
A 1 2 
lim + /ya,y + /7b]) = — -2/2 dr. 
Sere Q4 (ly VY Y VY J) VIr / € T 
Hint: See the previous exercise. 


Problem 56. [Parseval Formula] Let 8 be a function satisfying the conditions of 
Theorem 14, and let f be a density of the corresponding distribution. Prove that 


[ Pod Fe f BOP de < o. 


Hint: Use the inversion formula in Theorem 14 for one of the factors of f(x) in 
f f?, switch the order of integration, and make the change of variables x ~> —z. 
Be sure to justify your use of the Fubini Theorem. 


234 13. CHARACTERISTIC FUNCTIONS 


Problem 57. Let 2 be a function that satisfies the conditions of Theorem 14, and 
let f be the continuous density of the corresponding distribution. Suppose that 
G(v) is nonnegative for all v € R. Show that the function v ~> f(v)/f(0) is the 
characteristic function of the distribution whose density with respect to Lebesgue 
measure on R is @/(f{ G(v)dv). What examples of this result can you find in 
Table 13.1? 


13.8. Characteristic functions in Rf 


The spaces R?, d < œ, have enough structure so that characteristic functions 
can be defined. For an R¢-valued random variable X with distribution Q, the 
characteristic function 8 is given by 


p(w) = Ble") = J eiw) Q(dr), weER, 


where (w, x} denotes the Euclidean inner product of w and x. Note that charac- 
teristic functions of distributions on R? are bounded continuous functions from 
R? to the complex numbers (the continuity following as in the one-dimensional 
case from the Bounded Convergence Theorem). 


Problem 58. Let w > B(w) be the characteristic function of an R?-valued random 
variable X. Show that 


wj ~> B(0,...,0,w;,0...,0) 
is the characteristic function of X;, the jt! coordinate of X. 


Problem 59. Calculate the characteristic function of an R¢-valued random variable 
uniformly distributed on [—1, 1]. 


Problem 60. Let Z = (X + 2,Y — 2), where X and Y, respectively, denote the 
smaller and larger of the two components of a random vector uniformly distributed 
on the square region [—6, 6]? in R?. Show that the characteristic function of Z is 
the function on R? given by 


(u + were?) — yew tit) _ yesi(2u+v) 


ne) 72 uv(u + v) 


Problem 61. Show that the gradient of the characteristic function in Problem 60 
is zero at the origin. Give a probabilistic interpretation of this result. 


Problem 62. Let X = (X1, X2,...,Xa) be a random vector having independent 
coordinates each of which is normally distributed with mean 0. Denote the stan- 
dard deviation of X; by øj. Show that the characteristic function of X is 


d 
1 1 
w ~~ exp(—5 ) wfo?) = exp (-5wEv7) ; 
j=l 


13.8. CHARACTERISTIC FUNCTIONS IN Rf 235 
where the diagonal matrix © is the covariance matrix of X and w is viewed as a 


row matrix in the matrix product. 


Problem 63. Let X and Y be independent R-valued normally distributed random 
variables having mean 0 and variance 1. Find the characteristic functions of the 
R?-valued random variable (X, X + Y) and the R*-valued random variable (X — 
Y,X, X +Y). 


As a general rule, everything about characteristic functions of probability 
measures on R which one could reasonably expect to carry over to the R¢-setting 
does so. Here is one instance. 


Lemma 15. [Parseval Relation (for R?)] Let Q and R be two probability mea- 
sures on R¢, and denote their characteristic functions by B and y, respectively. 
Then 


J “(a — v) Q(dx) = | eon) B(y) R(dy) 


for each v € R. 


Problem 64. Prove the preceding lemma. 


We specialize Lemma 15 using Problem 62: 
pena Q(dz) 


___ 1 Sa f esien) luz t+u3)/20 

sara a en ü Bly) dy: ...dya. 
Thus, if Qı and Q2 have the same characteristic function 3, then 
(13.20) J e77*lz-vl? Q (dz) = / e77 le-o? O (dz) 
for every o > 0 and v € R?. Let a1,..., aq be numbers such that 

Qi({v: vj = aj for some 7}) = 0 = Q2({v: vj = aj for some j}). 


Multiply both sides of (13.20) by af/(2r)?/?, integrate with respect to v over 
the region 


(13.21) {v: vı < a1,..., Vd < aa}, 
and let o — oo to obtain (see the proof of Theorem 3) 
Qi({v: vı <a1,...,Ud < aa}) = Qo({u: v1 < ai,..., Va < ag}). 


The collection of sets of the form (13.21) is closed under pairwise intersections 
and generates the Borel o-field. Thus, Qı = Q2. We have proved the following 
important theorem. 


236 13. CHARACTERISTIC FUNCTIONS 


Theorem 16. Distinct probability measures on RÊ, d < œ, have distinct 
characteristic functions. 


Corollary 17. Suppose that X and Y are R*-valued random variables such 
that, for every w € R¢, (w, X) and (w,Y) have the same distribution. Then X 
and Y have the same distribution. 


Problem 65. Prove the preceding corollary. 


Corollary 18. Let d < co. The coordinates of an R? -valued random variable 
X are independent if and only if the characteristic function of X has the form 


d 
(13.22) w~ TT B;(wy) 
j=l 
for some continuous functions B; each of whose values at 0 is 1, in which case 
B; is the characteristic function of Xj, the j® coordinate of X. 


PROOF. Suppose the coordinates of X are independent. Then its character- 
istic function equals 


d d 
eee Elex) — (TI cfu ) = [| 2"), 
j=l j=l 


which has the form (13.22), with 6; equal to the characteristic function of X;. 

For the converse suppose that (13.22) holds and that 8;(0) = 1 for each 
j. By Problem 58, 8; is the characteristic function of X;. Let Y be a vector 
with independent coordinates, the distribution of each Y; being the same as the 
distribution of the corresponding coordinate X;. By the part of the corollary al- 
ready proved, the characteristic function of the random vector Y is the function 
(13.22)—that is, the characteristic function of the random vector X. By Theo- 
rem 16, X and Y have the same distribution and thus X, like Y, has independent 
coordinates. O 


Problem 66. Show that the sum and difference of two iid normally distributed 
R-valued random variables are independent. 


Problem 67. Suppose that X is an R?-valued random variable and that its char- 
acteristic function has the form w ~ §(w1)$2(w2), with 81 (0) = 4+ 32. Can any 
interesting conclusion be drawn? If so, what? 


The following proposition gives a formula for the moment generating function 
of the Euclidean norm of a random vector in terms of the characteristic function 
of the random vector itself. Thus, it generalizes Proposition 8. 


13.9. NORMAL DISTRIBUTIONS ON d-DIMENSIONAL SPACE 237 


Proposition 19. Let 8 denote the characteristic function of a distribution Q 
on Rè. Then, for u > 0, 


—ullx ul (S*) 1 
; EOE = CESTE J zue + pa PY) A? (dy), 


where 4 denotes Lebesgue measure on R. 


Problem 68. Prove the preceding proposition. 


13.9. Normal distributions on d-dimensional space 


An R¢-valued random vector X is said to be normally distributed if each inner 
product of X with a member of RÊ is an R-valued normally distributed ran- 
dom variable. The distribution of such a normally distributed random vector 
is called a normal distribution or Gaussian distribution on R?. The next result 
characterizes all normal distributions on Rê. 


Theorem 20. There is a one-to-one correspondence between the family of 
normal distributions on RË and the set of ordered pairs (c, ©), where c € RÊ and 
£ is a symmetric positive definite d x d matrix. The distribution corresponding 
to (c, ©) has mean vector c, covariance matriz X, and characteristic function 

i ws e` 3wEw"+i(w,c) l 

PARTIAL PROOF. Let Q be a normal distribution on R?, and let X be a ran- 
dom vector having distribution Q. For each z € R, the inner product (z, X) 
is an R-valued normally distributed random variable. By setting z equal to 
unit vectors in each coordinate direction we see that each coordinate X; of 
X is normally distributed. In particular, Var(X;) < oo for each i and, hence 
| Cov(X;, X;)| < œ for each i and j. Denote the mean vector and covariance 
matrix of X by c and ©, respectively. By Theorem 11 of Chapter 5, © is sym- 
metric and positive definite. O 


Problem 69. Complete the proof of the preceding theorem. Hint: For each w € Rf, 
find the mean and variance of the normally distributed random variable (w, X) in 
terms of w, c, and &. 


Theorem 21. Let Q be a normal distribution on RÌ and denote its mean 
vector and covariance matrix by c and ©, respectively. Then the support of Q is 
the subspace of RÌ perpendicular to {x € R¢: ExT = 0}. The support equals R? 


238 13. CHARACTERISTIC FUNCTIONS 


if and only if © is strictly positive definite in which case Q has a density with 


respect to d-dimensional Lebesgue measure given by 
paea a BM ae) 


(27)4 det(X) 


Problem 70. Prove the preceding theorem. Hint: Use the Change of Variables 
Theorem. 


13.10. + An application to random walks on Z 


We will use characteristic functions to show that a Z-valued random walk whose 
steps have mean 0 returns to its starting position with probability 1. 


Theorem 22. Let S = (So = 0, S1,...) be a Z-valued random walk for which 
E(S;) = 0. Then, 


P({w: S,(w) =0 for some n > 0})=1. 


PROOF. Denote the step distribution by Q and its characteristic function by 
B. In view of Corollary 13 of Chapter 11, we only need show that 


> Q*"({0}) = 


n=O 


By the Monotone Convergence Theorem, this is equivalent to 


(13.23) S 3"Q*"({0}) > œ 


n=0 


ass Z 1. 


Since 8” is the characteristic function of Q*”, we conclude from (13.12) that 


Qd) =z f(a)" 


For 0 < s < 1 we use the Fubini Theorem applied to the product of counting 
measure on the nonnegative integers and Lebesgue measure on [—7, 7] to obtain 


n aa ý 1 
SS "Q" ({0)) = y [ Da iio 


n=0 


for s < 1. Since the left side is real, we may replace the integrand by its real 
part. Thus, for s < 1, 


(13.24) os" Qe" {0}) = 


n=0 


if 1=sR(G(v)) y 
[= sRGO)P + BICO 


where SK and J indicate the real and imaginary parts, respectively. 


13.11. AN APPLICATION TO THE CALCULATION OF A SUM 239 


Let £ > 0. Since 8'(0) = 0, there exists a positive ve such that 1 — R(B(v)) < 
elv| and |3(8(v))| < elv| for |v] < ve. Using the inequality 2ab < a? + b? for real 
numbers a and b, we obtain 


[1 — sR(B(v))P° + [IL] 
= [1 = s]? + 2[1 — s]s[1 — R(B(w))] + [1 — RBU)? + PI) 
< 21 — s}? + 25°[1 — R(B(v))}? + PIBE) 
< 2[1 — s}? + 3e7u" 
for |v| < ve. Combining this inequality with the obvious inequality 
1 ~ sR(G(v)) > 1-5, 
we obtain 


= 1 7% l-s 
n *n > RAS d 
2 Qe MOY) 2 27 J 2[1 — s]? + 3e? v? ý 


eveV3 


1 1 
= —— arctan ———_= > — = as s/f l. 
2neV6 (l—s)/2 4eV6 a 
Since £ is arbitrary we conclude that X` s"Q*"({0}) => œ as s / 1, as de- 
sired. O 


Problem 71. What step or steps in the preceding proof would break down were 
the factors s” not introduced? 


* Problem 72. In Theorem 22, can one replace the hypothesis E(S1) = 0 by the 
weaker hypothesis that @’(0) = 0, where 8 denotes the characteristic function of 
51? 


Problem 73. The Fatou Lemma, the Dominated Convergence Theorem, the Mono- 
tone Convergence Theorem, and the Fubini Theorem have been important in this 
chapter. Write a short essay describing some of these techniques, lifting, as appro- 
priate, specific examples from this chapter. 


13.11. + An application to the calculation of a sum 


The sum $`}; k7? arises fairly often in probabilistic settings. One of the many 
ways of showing that it equals 77/6 is described in the next problem. 


Problem 74. Complete the missing steps in the calculation outlined below. For 
€ > 0, let Qe be the probability measure on Z whose density with respect to 
counting measure is 
0 if k =0 
Ky glee" 


ifk #0, 


240 13. CHARACTERISTIC FUNCTIONS 


where g(e) > 0 is determined by the condition that Qe be a probability measure. 
Our goal is to show that g(0) = 3/nx’. For each e, let ye denote the characteristic 


function of Qe. 
Fix £ > 0. We have 


o9 E 
1 —e* cosv 


" ivk —ivky\ —ek 
X ) =? = ee eh 
Pe (v) ee a k Je gle) ae cos v + e2€ 


Since y,(0) = 0, we can integrate y? to obtain 


/ e? +1 v 
elv) = g(e)u — 2g (€) arctan [(— = i] tan Al lsr, 


with appropriate interpretations for v = +7. Hence, for |v| < 7, 


pev) =l tale) f (e — 2 arctan [(2 1) tan =) du. 


Let £ N 0 to obtain 
2 


polv) =1+ s0) | [u — m sgn(u)] du = 1 + 9(0)( — m|v|) . 


Since fr yo(v) dv = Qo{0} = 0, 
—27n3 
0 = 27 + g(0)( 3 ), 


from which follows g(0) = 3/77, as desired. Rewriting this conclusion we have 


(e0) 
2 
T 


a; 
k? 6 ` 
k=1 


PART 3 


Convergence in Distribution 


242 PART 3. CONVERGENCE IN DISTRIBUTION 


Two random variables having a common distribution can, for some purposes, 
be treated as the same. It follows that in some settings one should introduce 
a convergence concept that focuses on distributions of random variables rather 
than the random variables themselves. Such a convergence concept is the focus 
of this part of the book. 

For distributions on R and R, the basic definitions, results, and examples are 
found in Chapter 14, along with several applications. Two of the best known 
results of classical probability theory are found in Chapter 15: the Classical 
Central Limit Theorem and the general Weak Law of Large Numbers. That 
chapter also contains a brief introduction to the theory of ‘large deviations’, and a 
‘local’ version of the central limit theorem. Chapter 16 gives a characterization of 
infinitely divisible distributions, and uses that characterization to give a complete 
solution to the ‘central limit problem’ for ‘triangular arrays’, which concerns the 
limiting behavior of sequences of sums of independent random variables. The 
full story is somewhat long, but we have organized Chapter 16 in such a way 
that useful parts of the story can be learned without covering the entire chapter. 
Chapter 17 treats ‘stable’ distributions and their relationship to the limiting 
behavior of sums of iid random variables. 

In Chapter 18, we leave the R- and R-settings and turn our attention to 
the convergence of distributions on ‘Polish spaces’ (complete separable metric 
spaces). With appropriate care, most of the theory from the R-setting can be 
adapted to Polish spaces. A major application of this generalization can be found 
in Chapter 19, where we show how to construct ‘Brownian motion’ and prove 
the Invariance Principle, which says that the distribution of Brownian motion 
is the ‘scaling limit’ of the distribution of every random walk in R whose steps 
have mean 0 and finite second moment. 


Certain notational shortcuts have shown themselves to be useful in probability. 

Their frequent use will begin in this part. In expressions such as 
{w: X(w) € A}, 
the ‘w’ will often be suppressed, and we will instead write 
[X € A]. 
This abbreviated notation is often combined with an omission of parentheses as 
illustrated by 
P[X € A] = P([X € A]) = P({w: X (w) € A}). 


Parentheses may also be omitted in other contexts. For example, if Q is a 
probability measure on (R, B), we may write Q(a, b] for Q((a,b]) and Q{zx} for 
Q({z}). And if f is a function with domain R’, we will often adopt the common 
convention of writing f(x,y) for f((z,y)). 


CHAPTER 14 


Convergence in Distribution 
on the Real Line 


In this chapter, we introduce a concept of convergence for sequences of distribu- 
tions on R and R. This ‘convergence in distribution’ gives us a rigorous way to 
express the idea that two distributions are close to each other. For instance, we 
show in an example that a Poisson distribution can be approximated arbitrarily 
closely by binomial distributions. An important result, the Continuity Theorem, 
gives a criterion for the convergence of a sequence of distributions in terms of 
the corresponding sequence of characteristic functions. There are several other 
useful criteria as well, which are collected together in a result known as the Port- 
manteau Theorem. Although the most important applications of convergence in 
distribution will be found in later chapters, some are included here, including 
an introduction to the theory of ‘extreme values’, a discussion of the effects that 
‘scaling’ and ‘centering’ have on sequences of distributions, and characterizations 
of moment generating functions and characteristic functions. 


14.1. Definitions and examples 


The following three examples serve as motivation for our main definition. 


Example 1. [Exponential distribution as a limit] Let Q be the standard ex- 
ponential distribution with mean 1. Thus, Q has a density f with respect to 
Lebesgue measure given by 


e`? ifr>o0 
zt) = T 
f(z) i ifr<0. 


Let Qn denote the distribution of geometric type that has mean 1 and whose sup- 
port consists of all nonnegative integral multiples of 1/n. So, for z a nonnegative 
integral multiple of 1/n, 

1 1 


Qn{z} = mare + = ; 


244 14. CONVERGENCE IN DISTRIBUTION ON THE REAL LINE 


Let F and Fn denote the distribution functions corresponding to Q and Qn, 
respectively. The functions F(z) and F,(x) are 0 for x < 0, and, for z > 0, 
F(z) =1-—e* and 
i k k+1 
Pate) ee Se PSO te: 

n n n 
Thus, F,(z) > F(x) as n > œ for all x. On the other hand, it is not true 
that Qn(B) —> Q(B) for all Borel subsets B of R; for instance, if B is the set of 
rational numbers, then Q,(B) = 1 for each n, but Q(B) = 0. Is it more natural 
to say that the sequence (Q1, Q2,...) converges to Q or to say that it does not? 


Example 2. Let X be an R-valued random variable whose distribution is Q. 
Suppose that Q(B) = 0 for the set B of rational numbers. For n = 1,2,... let 


Xn(we) = —[nX(w)]) 


Notice that Q,(B) = 1, where Qn denotes the distribution of Xn. It is easy to 
check that X,(w) > X(w) for each w. It does not bother us that a sequence of 
random variables each of which is rational with probability 1 can converge almost 
surely to a random variable that is irrational with probability 1. So, it is natural 
to look for a definition of convergence of a sequence of probability measures that 
will entail Qn —> Q in this case, despite the fact that Q,(B) A Q(B). 


Example 3. Let Q consist of the two points —1 and 1, each of which has 
probability +. Set X,(w) = w/n and X(w) = 0. Clearly, X,(w) > X(w) as 
n — oo for each of the two values of w. Let Fa and F denote the distribution 
functions of X, and X, respectively. Notice that F,(x) > F(x) for x Æ 0, but 


5 = F,(0) Al=F(0). 


Since X,, > X, we want a definition of convergence of sequences of distribution 
functions that will entail Fa — F in this case, even though pointwise convergence 
to F fails for the sequence (Fn). 


Definition 1. Let F and Fan, n = 1,2,..., be distribution functions for R. 
The sequence (Fp: n = 1,2,...) converges to F if F,(x) > F(z) as n > œ for 
every x at which F is continuous. Let Qn and Q be the probability measures 
corresponding to F, and F, respectively. Then (Qn: n = 1,2,...) converges to 
Q if (Fa: n = 1,2,...) converges to F. 


The convergence just defined is denoted by 


Fa > F as n > œ 


14.1. DEFINITIONS AND EXAMPLES 245 


for distribution functions, and by 
Qn > Q as n > œ 
for distributions. 


Proposition 2. Let F and Fn, n = 1,2,..., be distribution functions for R. 
Then F, > F as n > œ if and only if there is a dense subset D of R such that, 
for every x € D, Falz) > F(z) as n > œ. 


Problem 1. Prove the preceding proposition. 


* Problem 2. Let Fn, n = 1,2,..., F, and G be distribution functions for R. Sup- 
pose that Fn > F and Fn —> G as n —> oo. Prove that F=G. 


In view of the preceding problem we call F the limit of the sequence (Fn: n = 
1,2,...) if Fa > F as n > ov, and we write 
lim F, =F. 


NCO 


Similarly, if Qn > Q as n —- ov, we write 


lim Qn, =Q. 
Mm oo 


Problem 3. Show that a sequence (dz, : n = 1,2,...) of delta distributions con- 
verges if and only if the corresponding sequence (£n: n = 1,2,...) converges in 
R. 


* Problem 4. [Poisson limit of binomials] Fix A > 0. For integers n > À, let Qn 
denote the binomial probability distribution: 


Qrir} = (") Aras Noe, OSTIN, TEZ. 


n n 


Prove that Qn — Q, where Q is the Poisson distribution with mean A. 


Problem 5. [Binomial continuity] Fix a positive integer n. For 0 < p < 1, let Qp 
denote the binomial distribution given by 


where the convention 0° = 1 applies in case p = 0 or p = 1. Prove that the function 
p ~ Qp is continuous in the sense that if py -> p as k > oo, then Qp, + Qp as 
k = oo. 


Problem 6. Mimic the preceding exercise for other one-parameter families of dis- 
tributions: geometric, exponential, standard gamma (that is, with a = 1), and 
Poisson. In cases where the parameter ranges over an interval, discuss the limiting 
behavior as the parameter approaches any finite endpoint. 


246 


14. CONVERGENCE IN DISTRIBUTION ON THE REAL LINE 


Problem 7. [Beta continuity in two parameters] For a > 0 and 8 > 0, let Qa.g 
denote the beta distribution described in Example 3 of Chapter 3. Prove that the 
function (a, 8) ~ Qa,g is a continuous function. Decide which of the following 
limits exist and evaluate those that do: 


lim : lim i 
mm Qag; Inn Qag : 
lim Qa,8 ; lim Qa, ; 
a~o Boo 
lim : li : 
aS Qana ; nes Qaya ; 
lim ‘ lim ; 
(a,8)—+(0,0) Qag; (a,8)—>(œ,00) Qag ; 
lim ; lim , 
e a COO (a,8)-+ (00,0) Qa 


Problem 8. [Continuity of the normal family] Prove that the family of distri- 
butions on R of normal type is continuous as a function of mean and standard 
deviation. Include the cases of zero standard deviation in your considerations. 


Problem 9. [Negative binomial continuity] Let Qp,r,0 < p < 1 andr > 0, denote 
negative binomial distributions as defined by 


j 
b> (1 - p)" Dept, kez. 


(See Problem 23 of Chapter 10.) Investigate the continuity of the function (p,r) ~> 
Qp,r- 


Problem 10. [Gamma distribution as a limit] Let Q,,, be as in the preceding 
exercise, fix r > 0, and let Rm denote the distribution of the random variable 
m~'Xm, where the distribution of Xm is Q(1-1/m),r- Prove that, as m > œ, 
Rm — R, where R is the gamma distribution defined in Example 2 of Chapter 3, 
with parameters a = 1,y =r. What is the situation for r = 0? 


Problem 11. Modify the preceding exercise by letting the distribution of Xm be 
Q(1-r/m),r for some fixed A > 0. 


14.2. Limit distributions for extreme values 


In order to further illustrate of the concept of convergence for sequences of dis- 
tributions and also to give a brief introduction to an important topic that will 
not be treated elsewhere in this book, we will examine distributions that arise 
naturally when studying the maximum of a large number of iid random variables. 


Problem 12. Let (Xi, X2,...) be an iid sequence of R-valued random variables 
having common distribution function F. Prove that the distribution function of 
max{ Xz: 1 < k <n} is F”. 


14.2. LIMIT DISTRIBUTIONS FOR EXTREME VALUES 247 


Problem 13. [Gumbel distribution] Let 
F(z)=e° , «eR. 


Show that F is a distribution function that, for each positive integer n, is of the 
same type as F”. (The density of F is shown in Figure 14.1.) 


Problem 14. Prove that if F is a distribution function of Gumbel type, then 


—az 


Royse." 
for some constants a,c > 0. 
Problem 15. Let (Xn: n = 1,2,...) be an iid sequence of standard exponen- 
tially distributed random variables having mean 1. For n = 1,2,..., let Mn = 


max{X;,:1<k <n} and let G, denote the distribution function of (Mr — log n). 
Show that Gn — F as n > o, where F is a standard Gumbel distribution function 
(defined in Problem 13). 


Problem 16. Decide whether the sequence (Mp, — log n) of the preceding problem 
converges almost surely as n — oo. If not, does it converge in probability? Com- 
ment on any connections that this and the preceding problem have with Example 6 
of Chapter 9. 


Problem 17. Let X be distributed according to the Gumbel distribution of Prob- 
lem 13. Show that 


E(X)=-YF'(1) = f eve rje “dz, 


where I’ denotes the gamma function. (By Problem 29 of Chapter 8 this constant 
is Euler’s constant.) 


FIGURE 14.1. The density of the standard Gumbel distribution 


248 14. CONVERGENCE IN DISTRIBUTION ON THE REAL LINE 


Problem 18. [Weibull distributions] Let a > 0. Show that the function 


ell" ife <0 
T ~ 
1 ifr>o0 


is a distribution function that, for every n, is of the same strict type as the nth 


power of itself. (The parameter a is called the indez of the Weibull distribution.) 


Problem 19. Let X be a random variable having a Weibull distribution of index 
a. Calculate the distribution function of —X. 


Remark 1. The term ‘Weibull distribution’ is used more often for the distri- 
bution of —X obtained in the preceding problem than it is for the distribution 
in Problem 18. The reason is that the Weibull distribution arises in applications 
involving the minimum of a large number of nonnegative random variables. Note 
that X and —X are not of the same type. 


Example 4. Let (X1, Xo,...) be an iid sequence of random variables uni- 
formly distributed on [0,1] and set M, = max{X,:1< k < n}. It is easy to 
see that Mn + 1 a.s.; so we consider random variables of the form (Mn — 1)/an 
with the goal of obtaining a nondegenerate limit. To choose a, we calculate: 
Mn 


—] = 
nom < r] = P[Mn < 1 + zan] = | [PX < 1 + ran] = (1 + zan)”, 
k=1 


P| P 


valid for all z < 0 provided that a, is sufficiently small (depending on z). It 
is clear that we can obtain the nondegenerate limit e7 A 1 by taking a, = 1/n. 
Letting G, denote the distribution function of n(M, — 1), we conclude that 
the sequence (Gn: n = 1,2,...) converges to the distribution function of the 
negative of an exponentially distributed random variable having mean 1—that 
is, a Weibull distribution of index 1. 


Problem 20. [Fréchet distributions] Let a > 0. Show that the function 


0 ifx <0 
g ~ E 
e7 fr>0 


is a distribution function that, for every n, is of the same strict type as the nth 
power of itself. (The parameter a is called the indez of the Fréchet distribution.) 


Problem 21. Let (X1, Xe,...) be an iid sequence of random variables having a 
standard Cauchy distribution. For each n, let Mn = max{X,:1<k < n}. Find 
constants an > 0 and bn such that the distribution of Maba converges as n —> 0O 


to a Fréchet distribution of index 1. Hint: You may want to prove and then use 
the fact that u[5 — arctan u] —> 1 as u > oo. 


14.3. RELATIONSHIPS TO OTHER TYPES OF CONVERGENCE 249 


* Problem 22. Calculate the mean and variance of the Weibull and Fréchet distri- 
butions defined in Problem 18 and Problem 20, writing all finite answers in terms 
of the gamma function. 


Problem 23. Show that all Fréchet distributions have continuous densities and 
calculate them. Show more: that each Fréchet distribution function has infinitely 
many continuous derivatives and that the value of each of the derivatives at the 
lower endpoint of the support of the distribution equals 0. Hint: This problem has 
some features in common with Problem 34 of Chapter 5. 


Remark 2. The problems and examples of this section illustrate the main 
results of the theory of extreme values. It can be shown that, in general, if Mn 
denotes the maximum (or minimum) of the first n terms of an iid sequence and 
if the distribution of (Hata) converges as n — oo to a nondegenerate limit for 
some constants an > 0 and bn €E R, then the limiting distribution is of Gumbel, 
Weibull, or Fréchet type. 


14.3. Relationships to other types of convergence 


The concept of convergence of distribution functions defined in the first sec- 
tion has a nice relationship with almost sure convergence and convergence in 
probability. 


Proposition 3. Let X and Xn, n = 1,2,..., be R-valued random variables 
on a common probability space and suppose that Xn converges to X either i.p. 
or a.s. as n > o. Then Qn > Q as n > œ, where Qn and Q denote the 
distributions of Xn and X, respectively. 


PROOF. Since a.s. convergence implies convergence i.p. we will assume con- 
vergence i.p. throughout. Let F, and F denote the distribution functions of 
Xn and X, z a point of continuity of F, and € > 0. Choose ô > 0 so that 
F(z+ô8)— F(x -— ô) < €/2, and choose l so that P[|Xn — X| > 6] < £/2 whenever 
n >l. Then, for n > l, 


Falt =P Xe =o 
< P[X < z + ô] + P[|Xn — X| > ô] 
< (F(z)+ 5) +£ 
= F(x)+e, 


250 14. CONVERGENCE IN DISTRIBUTION ON THE REAL LINE 


and 


FNF AS r] 
< PCS a0 
< P[Xn <a] + P[|Xn—X| > ô+ £ 
<P Der ss 
=F (ayes O 


Let (Xn: n = 1,2,...) be a sequence of R-valued random variables, and 
suppose that there is a random variable X such that Qn > Q, where Qn and Q 
are the distributions of XY, and X, respectively. Then we say that the sequence 
(Xn) converges to X in distribution and write 


i 


Since it is possible to have convergence in distribution even if all the random 
variables involved are defined on different probability spaces, care must be taken 
when using the phrase “converges in distribution”. When one has convergence 
in distribution, it is only the distribution of the limiting random variable that is 
uniquely determined by the sequence and not the limiting random variable itself. 
Thus, for instance, an iid sequence of random variables converges in distribution 
to any one of the random variables in the sequence. In particular, the converse 
of Proposition 3 is not true. Nevertheless, there are two important results that 
are in the converse direction. 


Proposition 4. Let (Xn: n = 1,2,...) be a sequence of R-valued random 
variables on a common probability space. Then, for any c E€ R, Xn > c in 
oe l D 
probability as n > œ if and only if Xn — c as n > œ. 


Problem 24. Prove the preceding proposition. 


Proposition 5. Let Q and Qn, n = 1,2,..., be probability measures on R 
and suppose that Qn > Q as n — œ. Then there exists a probability space 
(Q,F,P) and R-valued random variables X and Xn, n = 1,2,..., defined on 
Q, such that the distributions of X and Xn are Q and Qn, respectively, and 
Xn > X a.s. as n > œ. 


PRooF. Let F and Fr, n = 1, 2, ... , be the distribution functions corre- 
sponding to Q and Qn, respectively. Let Q equal the interval (0,1), F its Borel 
field, and P Lebesgue measure on (Q, F). For w € Q, set 


X(w) =inf{z: F(x) > w}, 
as in Proposition 4 of Chapter 3, and 
Xnlw) = intt ee (a) > w}. 


14.3. RELATIONSHIPS TO OTHER TYPES OF CONVERGENCE 251 


(Recall that in a certain sense, X and F are inverses of each other, and similarly 
for X, and Fa. Intervals of constancy of F correspond to jumps of X, and jumps 
of F correspond to intervals of constancy of X.) 
Fix w and then fix 6 > 0 such that X (w)—4¢ is a point at which F is continuous. 
Clearly 
F(X (w) — 6) <a, 
and, hence, 
Fa(X (w) — 6) <w 
for all sufficiently large n. For such n, Xn(w) > X (w) — 6. Thus, 
lim inf Xnlw) > X(w) — 6. 
n> OO 
Now let ô N 0 to conclude that 
lim inf X,(w) > X(w). 
TL CO 


Since X is monotone, it has at most countably many points of discontinuity. 
Thus, for almost every w, X is continuous at w. For such an w fix £ > 0 and then 
fix ô > 0 such that X(w +e) +ô is a point at which F is continuous. Clearly 


F(X(w+e)+6) >wte. 
So for all n sufficiently large 
Fy(X(w +e) +90) > w, 
and, hence, 
Xnlw) < X(wt+e)t+o. 


Thus, 
lim sup Xnlw) < X (w +e) +ô. 


nco 


Now let ô N 0 and then £ N 0 to conclude that 


lim sup Xn (w) < X(w). 
N+ OO 


The preceding two paragraphs show that X,(w) > X(w) as n — oo for almost 
every w, in particular those at which X is continuous. O 


Proposition 5 is very useful for the study of sequences of probability measures, 
for it enables one to use results about almost sure convergence. This feature is 
illustrated in Problem 25, Problem 26, Proposition 6, and Proposition 7. 


Problem 25. [Bounded Convergence Theorem for Distributions] Let Q and Qn, n = 
1,2,..., be probability measures on R, and suppose that Qn > Q as n > œ and 
that there exists a single bounded set that supports every Qn. Prove that 


f Qnan) > J eQ(az). 


252 14. CONVERGENCE IN DISTRIBUTION ON THE REAL LINE 


Problem 26. [Fatou Lemma for Distributions] Let Q and Qn, n = 1,2,..., be 
probability measures, each of which is supported by [0, oo), and suppose that Qn — 
Q as n > co. Prove that 


[ sacar) < lim int fe Qn(adr), 


Problem 27. [Uniform Integrability Criterion for Distributions] Prove that most 
of the Uniform Integrability Criterion holds with almost sure convergence replaced 
by convergence in distribution. Explain why one should not expect one aspect of 
the criterion to be valid for convergence in distribution. 


Proposition 6. Let Q, R, and Qn and Rn, n = 1,2,..., be probability 
measures on R, and suppose that Qn > Q and R, > R as n > œ. Then 
Qn * Rn > Q* R asn > œ. 


PROOF. By Proposition 5 there exist random variables X, Y, and X, and 
Yn, n = 1,2,..., with respective distributions Q, R, and Qn and Rn, such 
that X, > X a.s. and Yp > Y a.s. as n —> oo. By using a product space, 
as in Theorem 7 of Chapter 9, we may arrange for the collection of random 
variables {X, X1, X2,... } to be independent of the collection {Y, Yi, Y2,... }. So 
the distribution of Xn + Yn is Qn * Rn, the distribution of X + Y is Q * R, 
and X,+Y, > X +Y as. as n — oo. The desired conclusion follows from 
Proposition 3. O 


Problem 28. Explore the possibility of proving the preceding proposition without 
using random variables. Either decide that there is such a proof that is straight- 
forward and write it or identify specific difficulties that lead you to appreciate the 
value of Proposition 5. 


14.4. Convergence conditions for sequences of distributions 


The next proposition gives equivalent conditions for the convergence of a se- 
quence of probability measures. These conditions are meaningful for probability 
distributions on spaces other than R and thus will be used for generalization in 
Chapter 18, where the corresponding result is known as the Portmanteau The- 
orem. In the proposition the notation OB is used for the boundary of a set B, 
the set obtained by removing the interior of B from the closure of B. 


Proposition 7. Let Q and Q,,n = 1,2,..., be probability measures on R. 
Then the following conditions are equivalent: 
(i) Qn 7 Q asn > œ; 
(ii) [gdQn > f gdQ asn > œ for each bounded continuous function 
g on R; 


14.4. CONVERGENCE CONDITIONS FOR SEQUENCES OF DISTRIBUTIONS 253 


(iii) limsup, 59, Qn(C) < Q(C) for each closed subset C of R; 

(iv) lim infp+oo Qn(O) > Q(O) for each open subset O of R; 

(v) limn+o00 Qn(B) = Q(B) for each Borel subset B of R for which 
Q(OB) = 0. 


PROOF. We will prove the chain of implications (v) => (i) => (ii) => (iii) 
as well as the implications (iii) <=> (iv) and {(iii), (iv)} => (v). 

(v) => (i). For z a real number for which Q{z} = 0, let B = (—œ, z]. By 
(v) we conclude that Q,(—00, xz] > Q(—œ, z]. Therefore, Qn > Q. 

(i) => (ii). Let X and Xn be as in Proposition 5, and let g be a continuous 
bounded function on R. Then go Xn > go X a.s. By the Bounded Convergence 
Theorem, E(go Xn) > E(go X); that is, f gdQn > f gdQ. 

(ii) => (iii). Let C be a closed subset of R. In order to use (ii) we introduce 
continuous functions whose integrals with respect to Q approximate Q(C) by 
virtue of the fact that they equal 1 on C and equal 0 on the complement of a 
neighborhood of C: for k = 1,2,... and x € R, define 


gx(z) = (1 — kinf{|y —2z|: ye C}) VO. 


Thus, each gx is continuous (by the triangle inequality) and (0, 1]-valued, has the 
value 1 on C, and has the value 0 at points whose distance from C is greater 
than 1/k. So, for each x € R, g(z) N Ic(z) as k > œ. By the Bounded 
Convergence Theorem and (ii), 


QC) = jim, | 9 dQ 


= lim lim | gpdQn 


k> co NCO 


> lim sup lim sup Qn(C) 
k= œ 


N— oo 


= limsupQ,,(C). 


(iii) <= (iv). For an arbitrary closed set C and its complement, an open set 
O, we have 


Q(C)— lim sup Q,,(C) 
= 1— Q(O) ~ limsup[1 — 2,(0)] 
= lim inf Qn (O) — Q(Q). 


If either the left or right side in this equality is nonnegative, then so is the other. 
{ (iii), (iv)} => (v). Let B be a Borel set for which Q(OB) = 0. Let C and O 
denote the closure and interior of B, respectively. Then, 


lim sup @,,(B) < lim sup Qn(C) < Q(C) 
= Q(B) = Q(O) < liminf Q,,(O) < liminf Q,(B); 


254 14. CONVERGENCE IN DISTRIBUTION ON THE REAL LINE 


so, equalities hold throughout. O 


Problem 29. Let (zn: n = 1,2,...) denote a convergent sequence of real numbers 
with limit z. For n = 1,2,..., let Qn denote the delta distribution at æn; also, 
let Q denote the delta distribution at x. Show directly that each of (i)-(v) of the 
preceding proposition holds. Also, show that strict inequality is possible in (iii) 
and (iv). 


Problem 30. Suppose that Qn > Q and that every Qn is supported by a common 
closed set C. Use the preceding proposition to prove that Q is supported by C. 


The following result concerns the convergence of probability measures with 
densities. 


Proposition 8. Suppose that probability measures Q and Qn, n = 1,2,..., 
on R are all absolutely continuous with respect to a common o-finite measure p, 
and denote their respective densities by f and fn. If fn > f u-almost everywhere, 
then Qn > Q. 


Problem 31. Prove the preceding proposition. 


Problem 32. Give an example that contradicts the converse of the last sentence in 
the preceding proposition. 


Problem 33. Give an example of a sequence (Qn: n = 1,2,...) of probability 
distributions on R which satisfies: (i) there is a o-finite measure p such that, for 
each n, Qn is absolutely continuous with respect to and, moreover, its density 
fn is continuous; (ii) as n — oo, the sequence (fn) converges uniformly; (iii) the 
sequence (Q,,) does not converge. 


14.5. Sequences of distributions on R 


Let us turn our attention to probability measures on the extended real line 
R = [—00, oo]. Definition 1 carries over directly to the R-setting: simply replace 
“R” by “R” in the first sentence of the definition. 

Some books and articles use a term different from ‘convergence in distribution’ 
when speaking of R. We will not adopt this practice and will pay the price of 
having to state explicitly whether we are working in R or R whenever there is a 
chance of ambiguity. 


Example 5. Let 
0 if z < -n 
Falz) = 41/2 if -n<a<n 
1 ifn <r. 


14.6. RELATIVE SEQUENTIAL COMPACTNESS 255 


Each F, is a distribution function for R, and in the R-setting the sequence 
(Fa: n = 1,2,...) fails to converge. However, each Fha is also a distribution 
function for R, and in the R-setting the sequence does converge—to the dis- 
tribution function that is identically equal to 1/2. In this case, the limiting 
distribution is $(d_oo + doc). 


in 


Problem 34. For the one-parameter families of distributions—geometric, exponen- 
tial, gamma, and Poisson—discuss, for the R-setting, the limiting behavior as the 
parameter approaches the endpoints, finite or infinite, of the interval in which it 
takes its values. 


Problem 35. Find a sequence (Qn: n = 1,2,...) of probability measures on R such 
that Qn({oo} U[—oo, 0)) = 0 for each n, f  Qn(dz) < co for each n, f xQn(dz) > 
oo, and Qn > Q for some Q for which Q{oo} = 0 and f «Q(dz) < OO; 


Problem 36. Adapt Problem 2, Proposition 3, Proposition 5, Proposition 7, and 
Problem 30 to the R-setting. 


The next two problems involve the zeta distribution, which was introduced 
an optional section of Chapter 9. We use the notation bB to denote the set 


{bz: x € B} for b € Rand BCR 


* 


Problem 37. Let P, denote the zeta distribution. That is, P, is the distribution 
supported by Z* \ {0} and satisfying P.({z}) = oe where ¢ is the Riemann 
zeta function. Fix a number c > 1, and for z > 1 and each Borel set B, set 
Q.(B) = P.(c\/@-)B). For the R-setting show that, as z N, 1, Q: —> Q, where 
Q{oo} = 1/c and Q{0} = (c—1)/c. (Of course, were we in the R-setting we would 
say that lim.\,1 Q: does not exist.) 


Problem 38. Let P, be as in the preceding problem, and let X, be a random 


variable with distribution P.. Let Y be a random variable with the standard 
exponential distribution. Show that (z — 1) log X: "YY asz Bae 


14.6. Relative sequential compactness 


In attempting to prove that a sequence of probability measures converges, often 
the first step is to show that it has a convergent subsequence. In this section we 
develop criteria for determining whether a sequence of probability measures on 
R or R has a convergent subsequence. 


Definition 9. For either the R- or the R-setting, a family Q of probability dis- 


tributions is relatively sequentially compact if every sequence (Qn: n = 1,2,...) 
of members of Q has a convergent subsequence. 


256 14. CONVERGENCE IN DISTRIBUTION ON THE REAL LINE 


The word ‘relatively’ is used in the preceding definition because it is not 
required that the limit belong to Q, but only that it be a probability distribution 
on R or R, whichever is relevant. Now we come to a fact that makes the study of 
convergence in distribution more pleasant in the R-setting than in the R-setting. 


Theorem 10. Every set of probability distributions on R is relatively sequen- 
tially compact; that is, every sequence of probability distributions on R has a 
convergent subsequence. 


PROOF. Let S = (Fy: n = 1,2,...) be a sequence of distribution functions 
for R. Since the values of distribution functions lie in the bounded interval (0, 1), 
the sequence (F(x): n = 1,2,...) has a convergent subsequence for any z € R. 
By using the Cantor diagonalization procedure, we can find a single subsequence 
(Fy, : k =1,2,...) such that (Fy, (r)) converges to a limit G(r) for every rational 
number r. Since each Fn is increasing, so is G. Hence, for x € R, we may define 

F(z)= lim G(r). 


rN 
r rational 


We will finish the proof by showing that F is a distribution function for R 
and that Fa, > F. That F is increasing follows from the fact that G is. Since 
all values of each F, lie in [0, 1], so do all values of G and, hence, all values of F. 
To show that F is right-continuous at z, let € > 0 and choose a rational r > z 
such that G(r) < F(x) +e. Then, for any y € (z,r), F(y) < F(x) +. It follows 
that F is right-continuous, and so a distribution function for R. 

Let x be a point of continuity of F and let y > 0. We want to show that both 


(14.1) Fr, (2) < F(a) +7 
and 
(14.2) Fy, (2) > F(z) -y 


for all sufficiently large k. To show (14.1) choose a rational r > x so that 
G(r) < F(x) + 7/2, and choose lı so that, for all k > l1, Fn,(r) < G(r) + ¥/2. 
Then, for k > l, 


Fin (2) < Pa, (7) < G(r) + 7/2 < F(z) +79; 


as desired. To prove (14.2) choose a z < x such that F(z) > F(z) — y/2. 
Next fix a rational s € (z,z] and choose an integer l2 so that, for all k > lo, 
Fi, (s) > G(s) — y/2. Then, for k > lo, 


Fn, (£) > Fn, (8) > G(s) — 7/2 2 F(z) — 9/2 > F(z) — 7. 


Therefore, for k > l V l2, both (14.1) and (14.2) hold. O 


14.6. RELATIVE SEQUENTIAL COMPACTNESS 257 


Problem 39. Create an example that shows the possibility of there existing a ra- 
tional number r for which F(r) # G(r), where F and G are as defined in the 
preceding proof. 


Problem 40. Create an example that shows the possibility of F(r) Æ G(r) for all 
rational r, where F and G are as defined in the preceding proof. 


For the R-setting, the set of all distribution functions is not relatively sequen- 
tially compact, since a sequence of distributions on R can converge to a distribu- 
tion that gives positive probability to oo (see, for instance, Problem 37). The 
phrase “mass can escape to infinity” is sometimes used as an informal descrip- 
tion of this possibility. Any characterization of relative sequential compactness 
of a collection Q of probability measures on R will be equivalent to the following 
condition, which says in a formal way that mass does not escape to infinity: 


(14.3) lim sup Q[-6, b]° = 0 
b> QEQ 


(see Theorem 13). 

It will become apparent in the remainder of this chapter and also in Chap- 
ters 15 and 16 that characteristic functions are important in the study of con- 
vergence in distribution. The next lemma will enable us to obtain a condition 
on the behavior of characteristic functions near 0 that is equivalent to (14.3). 


Lemma 11. Let Q be a probability measure on R. For all b > 0, 
1/b 1 
(1 = sin 1Q[-B,0]° <b [ [1 - RPO) dv < 2Ql-VB, Vil + =, 
0 


where 3 denotes the characteristic function of Q and R(G(v)) denotes the real 
part of B(v). 


PROOF. By the Fubini Theorem 


1/6 1/b 
o [1 — R(8(v))] dv = oJ (1 — cos(vz)) dv Q (dz) 


= J (1 — aem) Q(dz) > = € — en) Q(dz) 


> I (1 — sin 1) Q(dr) = (1 — sin 1)Q[-, B]°, 
[—b,b}< 


proving the first inequality. We have used the fact, easily verified by elementary 
calculus, that the function y ~ 1 — sin(y)/y takes its minimum value in the 
interval [1,00) at y = 1. 


258 14. CONVERGENCE IN DISTRIBUTION ON THE REAL LINE 


The second inequality follows from the following calculation, which holds for 
all v € Rand b> 0: 


1 — R(8(v)) = J (1 — cos(vz)) Q(dz) 


= EF E E | pg ET lr) Qe 


< 20[-vb, vie + | T Qde) <29(-vi, vie + 2. o 


[- vb, vb] 


We need one more definition before stating a theorem which gives several 
equivalent criteria for relative sequential compactness. 


Definition 12. A set U of R-valued functions defined in a neighborhood of 
a point zo is equicontinuous at Zo if for every € > O there is a neighborhood M 
of zo such that |w(x) — w(zo)| < € whenever z € N and w € U. 


Note that condition (ii) in the following theorem is obviously equivalent to 
(14.3). 


Theorem 13. Let Q be a set of probability measures on R, and U the set of 
their characteristic functions. Then the following four statements are equivalent: 


(i) Q is relatively sequentially compact; 

(ii) for every e > 0 there exists a bounded subset B of R such that Q(B) > 
1—e for all Q EQ; 

(itt) U is equicontinuous at 0; 

(iv) for every € > 0 there exists a b > 0 such that 


1/b 
o | [1 —R(G(v))] dv < € 
0 
for all but finitely many 8 € U. 


PROOF. We first show that (i) implies (ii). For a proof by contradiction, 
suppose that (i) holds but that (ii) does not hold. Since (ii) does not hold, there 
exists an £ > 0 and a sequence (Qn: n = 1,2...) of measures in Q such that for 
every n, Qn{—n,n]° >e. Thus, for all b > 0, 


(14.4) lim sup Qn(—b,b) < l—-e. 

n—> CO 
By (i), (Qn) has a convergent subsequence; call its limit Q. By (14.4) and 
Proposition 7, Q(—b,b) < 1 — € for all b > 0. This is a contradiction, since no 
probability measure on R has this property. 

Next we show that (ii) implies (i). We may regard the members of Q as 
distributions on R. By Theorem 10, any sequence (Qn: n = 1,2,...) in Q has a 
convergent subsequence (Qn, : k = 1,2,...) whose limit Q is a distribution on R. 
By the extension of Proposition 7 to the R setting (see Problem 36), Q[—b, b] > 


14.6. RELATIVE SEQUENTIAL COMPACTNESS 259 


lim sup; Qn, [—5, b] for all b > 0. It follows from (ii) that limp... Q[—b, 6] = 1, 
so Q is a distribution on R. Thus (i) holds. 

So far we have shown that (i) and (ii) are equivalent. Since, as mentioned 
earlier, (ii) is equivalent to (14.3), the second inequality in Lemma 11 makes it 
clear that (ii) implies (iv). It is also easy to see that if U is equicontinuous at 0, 
then (iv) holds. So (iii) implies (iv). 

Now we show that (iv) implies (ii). The first inequality in Lemma 11 shows 
that if (iv) holds, then for any € > 0, there exists an a > 0 such that Q|—a, a]° < € 
for all Q € Q except possibly for Q in some finite set Q’ C Q. For each Q € Q’, 
the Continuity of Measure Theorem implies that lima>œ Q[—a,a]° = 0. Since 
Q’ is finite, there exists an a’ > 0 such that Q[—a’,a’]° < € for all Q € Q’. Now 
(ii) holds with b = max{a,a’}. 

Finally, to complete the proof, we show that (ii) implies (iii). Let Q be a 
measure in Q, with characteristic function 8. The following calculation is very 
similar to one made in greater detail in the proof of Lemma 11, except that this 
time we use the inequality |1 — e*’*| < 2|vz| (see E.9 of Appendix E): 


1 — B) < J i- e71 Q(x) 
< 2Q|—b, y+ f |1 — e" | Q(dz) 
] 


ł 


< 2(Q[—b, b]° + Jub) . 


Combining this inequality with (ii), we see that for any € > 0, we may find a 
b > 0 such that 


|1 — Blv)| < €/2 + 2|vb| 


for 8 € U. Let N = (—e/2b,€/2b). Then |1 — 8(v)| < £ for all v € N and 8 EU. 
Thus U is equicontinuous at 0. O 


Problem 41. Show that the phrase “but finitely many” can be dropped from (iv) of 
Theorem 13, and that it can be added to (ii). (The phrase is included in the theorem 
to make it easier to use (iv) as a criterion for relative sequential compactness. See, 
for example, the proof of Theorem 15.) 


We conclude this section with a result which is quite useful for showing that 
a sequence of probability measures has a limit. Its proof is very similar to 
the analogous result concerning sequences of real numbers, Proposition 4 of 
Appendix B. 


Theorem 14. In either the R- or R-setting, suppose that (Qn: n = 1,2,...) 
is a sequence of probability distributions that has the property that, for some prob- 
ability distribution Q, every subsequence has a further subsequence that converges 
to Q. Then Qn > Q as n > ow. 


260 14. CONVERGENCE IN DISTRIBUTION ON THE REAL LINE 


Problem 42. Prove the preceding theorem. 


14.7. The Continuity Theorem 


From part (ii) of Proposition 7 we see that if a sequence of distributions con- 
verges, then the corresponding sequence of characteristic functions converges. 
The following theorem both strengthens this assertion and provides a converse. 
It is called the Continuity Theorem because it says that, in a certain sense, the 
function that maps every probability measure on R to its characteristic function 
is continuous and has a continuous inverse. 


Theorem 15. [Continuity (for Characteristic Functions)] A sequence of 
probability distributions on R converges to a probability distribution Q if and 
only if the sequence of corresponding characteristic functions converges point- 
wise to a function y which is continuous at 0, in which case the convergence to 
y is uniform on [—u,u] for every u € Rt, and y is the characteristic function of 


Q. 


PROOF. Suppose first that Qn —> Q as n — oo and denote the characteristic 
functions of Qn and Q by n and 8. All characteristic functions are continuous, 
so, in particular, @ is continuous at 0. We already know from the Bounded 
Convergence Theorem that 8,(v) > (v) as n > co for all v € R. It is the 
uniform convergence on bounded intervals that needs to be proved. 

By Proposition 5 there exists a probability space and random variables X and 
Xn, n = 1,2,..., defined on that probability space such that Q is the distribution 
of X, Qn is the distribution of X, for n = 1,2,..., and Xn > X ass. 

Fix u € R”. By E.9 of Appendix E, 


fence SKN le 


Hence 
sup |e*?*" — e?*| < Qul|X, — X| > 0 a.s. as n> oo. 
|v|<u 
Then 
sup |Bn(v) — 8(v)| = sup |E(e’*") — E(e*’*)| 
|v]<u Jvj<u 
< sup E(\e’’*" — et®*]) 
jv|<u 
< E( sup ei” — eX) 
ju|<u 


which, by the Bounded Convergence Theorem, approaches 0 as n —> ov, giving 
the desired uniform convergence. 

For the converse, let Q = (Qn: n = 1,2,...) be a sequence of probability 
measures on R with corresponding characteristic functions Bn, and suppose that 


14.7. THE CONTINUITY THEOREM 261 


there is a function ~y, continuous at 0, such that B,(v) > y(v) as n => œ for 
each v € R. 
Since y is continuous at 0, for every £ > 0 there exists a b > 0 such that 


1/b 
b | HERGIN UE: 
0 


By the Bounded Convergence Theorem, 


1/b 
of [1 — R(Br(v))]dv <e 


for all but finitely many n. We have thus verified condition (iv) of Theorem 13, 
so the sequence Q is relatively sequentially compact. In particular, it has a 
convergent subsequence. 

By the first paragraph of this proof, the characteristic function of the limit 
of any convergent subsequence of Q must be y. So, all convergent subsequences 
have the same limit, namely the distribution Q whose characteristic function is 
y. By Theorem 14 we conclude that Qn > Q as n => œ. O 


The preceding proof illustrates a useful technique. When proving an implica- 
tion and its converse, prove the easier of the two implications first and then use 
that implication, if possible, in the proof of the converse. 


Problem 43. Use the preceding theorem to redo Problem 8 and Problem 10. 


A consequence of Proposition 4 and Theorem 15 is that a sequence of random 
variables converges in probability to the zero random variable if and only if 
the corresponding sequence of characteristic functions converges to the function 
v ~~ 1. The following useful fact strengthens the ‘if part’ of the last sentence. 


Lemma 16. Let (Bk: k = 1,2,...) be a sequence of characteristic functions 
such that Bk(u) 4 1 as k > œ for every u in some open interval containing 0. 
Then, for every u € R, Bklu) > 1 ask> œ. 


* Problem 44. Prove the preceding lemma. Hint: Use positive definiteness. 


Problem 45. Let (Xn: n = 1,2,...) be a sequence of R-valued random variables 
defined on a common probability space and (Bn: n = 1,2,...) the corresponding 
sequence of characteristic functions. Prove that Xn — c in probability as n > oo 
if and only if, for every u in some open interval containing 0, Bn(u) > e’”* as 
n — œ. 


The stories related to convergence of sequences of moment generating func- 
tions and probability generating functions are similar to that just described for 
characteristic functions. The monotonicity of the two types of generating func- 
tions makes it easy to prove the following six relevant results for these settings. 


262 14. CONVERGENCE IN DISTRIBUTION ON THE REAL LINE 


Theorem 17. In the Rt -setting, a set of probability distributions is relatively 
sequentially compact if and only if the set of corresponding moment generating 
functions 1s equicontinuous at 0. 


Theorem 18. In the Z*-setting, a set of probability distributions is relatively 
sequentially compact if and only if the set of corresponding probability generating 
functions is equicontinuous at 1. 


Theorem 19. [Continuity (for Moment Generating Functions)] A sequence 
of distributions on R% converges to a distribution Q on R* if and only if the 
sequence of corresponding moment generating functions converges pointwise to 
a function y that is continuous at 0, in which case y is the moment generating 
function of Q. 


Theorem 20. [Continuity (for Probability Generating Functions)]| A se- 
quence of distributions on Zt converges to a distribution Q on Z* if and only 
if the sequence of corresponding probability generating functions converges point- 
wise to a function p for which p(1—) = 1, in which case p is the probability 
generating function of Q. 


Theorem 21. A sequence of distributions on R converges to a distribution 
Q if and only if the sequence of corresponding moment generating functions con- 
verges pointwise to a function y, in which case y is the moment generating 
function of Q. 


Theorem 22. A sequence of distributions on Z converges to a distribution 
Q if and only if the sequence of corresponding probability generating functions 
converges pointwise to a function p, in which case p is the probability generating 
function of Q. 


Problem 46. Prove a representative subset of the preceding six theorems. 


Problem 47. Use probability generating functions to redo Problem 4 and Prob- 
lem 5. 


* Problem 48. Use probability generating functions to redo Problem 9. 
* Problem 49. Use moment generating functions to redo Example 1 and Problem 10. 


Problem 50. Do Problem 34 again, this time by using Theorem 21 and Theo- 
rem 22. 


14.8. SCALING AND CENTERING OF SEQUENCES OF DISTRIBUTIONS 263 


14.8. Scaling and centering of sequences of distributions 


Let (Qn: n = 1,2,...) be a sequence of probability measures on R. Suppose 
that there is some pattern in the definitions of the Qn, so that one is motivated 
to describe the behavior of the sequence (Qn) for large n. Often, the best way to 
try to describe such behavior is in terms of appropriately ‘centered’ and ‘scaled’ 
versions of the distributions Qn. More precisely, as in Problem 21, one looks for 
a sequence (an: n = 1,2,...) of positive constants (the ‘scaling’ constants) and 
another sequence (bn: n = 1,2,...) of real constants (the ‘centering’ constants), 
such that the sequence (Rn: n = 1,2,...) of distributions converges, where, for 
each n, 


Rna (B) =Qn(anBt+bn), BEB. 


Notice that if Qn is the distribution of an R-valued random variable Vp, then 
Rn is the distribution of the random variable (Vn — bn) /an. 


Problem 51. Let (Qn: n = 1,2,...) be any sequence of distributions on R. Show 
that for any c € R, scaling and centering constants can be chosen so that Rn > 6¢ 
as n — oo, where Rn is defined as in the preceding paragraph. Show that it is 
possible to choose the centering constants equal to 0 if c = 0 but that such a choice 
for the centering constants is not in general possible if c Æ 0. 


In view of Problem 51, we do not accomplish much if we choose scaling and 
centering constants in such a way that the limit of the scaled and centered 
sequence is of degenerate type. It is thus natural to ask: (i) Can (an) and (bn) 
be chosen so that (Rn) converges to a distribution on R that is not degenerate? 
(ii) If so, how should (an) and (bn) be chosen and for such a choice, what is 
the limit? (iii) In what sense, if any, is lim, Rn unique when it exists and is 
nondegenerate? The first two questions will be addressed in Chapters 15 and 17 
as well as later in this chapter. The answer to the third question is a consequence 
of the following preliminary result: 


Lemma 23. Let (Vn: n = 1,2,...) be a sequence of R-valued random vari- 
ables that converges in distribution to an R-valued random variable V that is not 
of degenerate type. Let (an: n = 1,2,...) and (bn: n =1,2,...) be sequences in 
(0,00) and R, respectively. Then the sequence ((Vn — bn)/an: n = 1,2,...) con- 
verges in distribution to an R-valued random variable that is not of degenerate 
type if and only if the sequence (an) converges to a member a of (0,00) and the 
sequence (bn) converges to a member b of R, in which case 


Vn —bn D V—b 
= ; 


An a 


PROOF. For each n, let Bn be the characteristic function of Vn, and let 8 be 
the characteristic function of V. By Problem 7 of Chapter 13, the characteristic 


264 14. CONVERGENCE IN DISTRIBUTION ON THE REAL LINE 


function of (Vn — bn)/an is the function yn defined by 
aly) Se 8 pia Bn(v/an). 
Clearly, if an — a € (0,00) and bn > b € Ras n > œ, then ynv) > 
e™12b/a B(v/a), which is the characteristic function of (V — b)/a. So 
Va—bn D V-—b 
—— -—> 


An a 


as desired. Since V is not of degenerate type, neither is (V — b)/a. 

It remains to prove the ‘only if? part. Let Bn, yn, and 8 be as in the preceding 
paragraph, and let y be the limit as n > œ of yn. We assume that neither 8 nor 
y is a characteristic function of a degenerate distribution. We must show that 
there exist constants a > 0 and b € R such that a = lim, a, and b = lim, bn. 

Choose an increasing sequence (nx: k = 1,2,...) of positive integers such that 
An, — a and bn, —> b as k > oo for some constants a € [0, co] and b € [—00, œ]. 
We will show that a cannot be 0 or oo, and that b cannot be too. Suppose first 
that a = oo. Since |y,(v)| = |Bn(v/an)| for all v € R, 


ly(v)| = jim [Yn (v)| = jim |Bn,(v/an,)| = 1. 


Let R be the symmetrization of the distribution corresponding to y. The char- 
acteristic function of R is |y|?, which is identically equal to 1, so R = do. It 
follows that the distribution corresponding to y is degenerate. Since we are as- 
suming that this distribution is not degenerate, our argument shows that a = oo 
is impossible. The proof that a Æ 0 is carried out in a similar manner, using the 
fact that |3,(v)| = |yn(@nv)| for v € R. 

Thus we have shown that a cannot be 0 or oo. Choose a bounded neighbor- 
hood N of the origin so that y(v) # 0 and G(v/a) 4 0 for v € N. Then 


oa ivbn, /any — Yn, (V) ¥(v) 

Bry (v/an,) B(v/a) 
for v € MN. The left side is continuous and never equal to 0, so it has a unique 
continuous logarithm that takes the value 0 at v = 0 (see Problem 7 of Ap- 
pendix E). This logarithm is obviously equal to —ivbn, /an,. The limit on the 
right side is also continuous and nonzero for v € M, so it also has a unique 
continuous logarithm that takes the value 0 at v = 0. The logarithm of the right 
side equals the limit of the logarithm of the left side as k —> 00, which is —ivb/a. 
It follows that 6 is finite. Furthermore, the first paragraph of this proof shows 
that 


as k> œ 


Vumbi ao Veb 


an 


D 
— as k> oo. 


k 
By hypothesis, the distribution of the random variable on the right has charac- 
teristic function y. 

It remains to show that the original sequences (an) and (bn) have the limits a 
and b, respectively. The argument in the preceding two paragraphs shows that 


14.8. SCALING AND CENTERING OF SEQUENCES OF DISTRIBUTIONS 265 


every subsequence of ((@n, bn) : n > 1) has a further subsequence that converges 
to a member of (0,00) x R and that for any subsequential limit (a’,b’), y is the 
characteristic function of (V — b')/a'. Thus (V — b')/a' has the same distribution 
as (V — b)/a. It follows from Proposition 6 of Chapter 3 that (a’,b’) = (a,b). 
We have verified the conditions of Proposition 4 of Appendix B for the sequence 
((an, bn): n > 1). The conclusion is that an + a and bn > b as n > œ. O 


* Problem 52. Give an example that shows that the hypothesis in the preceding 
proposition that V not be of degenerate type may neither be removed nor even be 
replaced by the hypothesis that V not be a.s. equal to 0. 


Theorem 24. [Convergence of Types] Let (Vn: n =1,2,...) be a sequence of 
R-valued random variables, and suppose, for some an € (0,00) and bn E R, n = 
1,2,..., that the sequence ((Vn — bn)/an: n =1,2,...) converges in distribution 
to an R-valued random variable Y that is not of degenerate type. Let (af: n = 
1,2,...) and (b: n =1,2,...) be sequences in (0,00) and R, respectively. Then 
the sequence ((V,—6,,)/a,,:n=1,2,...) converges in distribution to an R-valued 
random variable Y’ that is not of degenerate type if and only if a},/an converges 
to a constant a’ € (0,00) and (bi, — b,)/an converges to a constant b' € R as 
n — oo. In this case, Y’ has the same distribution as (Y — b’)/a’. 


Problem 53. Use Lemma 23 to prove the preceding theorem. 


The preceding theorem makes it natural to speak of the limit type (if it exists) 
of a given sequence, implicitly excluding the degenerate type. To avoid any 
ambiguity one sometimes uses the term nondegenerate limit type. If it happens 
that with a certain scaling and centering one gets convergence to a constant 
random variable, one tries to find another scaling and centering that gives a 
nondegenerate limit type. The term normalization is often used to describe the 
process of scaling and centering. 

Sometimes one is willing to allow scaling but not centering. Then, in addi- 
tion to nondegenerate limit types, degenerate limits are also of interest, with 
the exception of the delta distribution at 0. In this setting, one speaks of the 
strict limit type of a sequence (if it exists). The following result justifies this 
terminology. 


Theorem 25. [Convergence of Strict Types] Let (Vn: n = 1,2,...) be a 
sequence of R-valued random variables, and suppose, for some an € (0,00), 
n=1,2,..., that the sequence (Vn/an: n =1,2,...) converges in distribution to 
an R-valued random variable Y that is not a.s. equal to 0. Let (a!f: n = 1,2,...) 
be a sequence of positive numbers. Then the sequence (Vafan: n = 1,2,...) 
converges in distribution to an R-valued random variable Y’ that is not a.s. 


266 14. CONVERGENCE IN DISTRIBUTION ON THE REAL LINE 


equal to O if and only if a/an converges to a constant a’ € (0,00) as n + ov. 
In this case, Y’ has the same distribution as Y/a’. 


PROOF. The proof of the ‘if’? portion and the last sentence of the theorem is 
just as in the Convergence of Types Theorem. That theorem also implies the 
‘only if’ part if either Y or Y’ has a nondegenerate distribution. It remains to 
prove the ‘only if’? part under the assumption that both Y and Y’ are almost 
surely equal to a nonzero constant. For this part of the proof, we revisit the 
proof of the ‘only if’ portion of Lemma 23. 

Let 8, be the characteristic function of V,/a,, and let 8 and y be the charac- 
teristic functions of Y and Y’, respectively. Then by Problem 7 of Chapter 13, 
the characteristic function of V,/a}, is the function v ~> 6,(a,v/a),). By hy- 
pothesis, 

lim Bn(v) = B(v) and lim Brn(anv/a,,) = yw) 
for all v € R. Now the same argument used in the proof of the ‘only if’ portion 
of Lemma 23 applies to show that the sequence (a,/a},: n = 1,2,...) has a limit 
a’ € (0,00). O 


Problem 54. Let (Xn: n = 1,2,...) be an iid sequence of R-valued random vari- 
ables with nonzero finite mean. For every c # 0, characterize all sequences 
(an: n=1,2,...) of positive constants such that 


Xi tes: +An D 
——— >i asSn-—-+0o. 
an 
Note: When the mean is 0 and the variance is finite, no such sequences exist. If 
the variance is infinite, such sequences may or may not exist. Further information 


on the case with mean 0 may be found in Chapters 15 and 16. 


14.9. Characterization of moment generating functions 


Theorem 14 of Chapter 5 identifies those functions that are probability generat- 
ing functions of Z -valued random variables. We will now use that theorem in 
conjunction with Theorem 19 of this chapter to identify those functions that are 
moment generating functions. 

Let y be the moment generating function of a probability measure Q on RT: 


eu) = fe Q(as), aren 


By Problem 38 of Chapter 13, y is continuous, and, for u > 0, the k*® derivative 
of y at u is given by 


pt”) (u) = (-1)* [ zte"? Q(dz). 
R+ 


An R-valued function defined on an interval is said to be completely mono- 
tone if it and all its even-order derivatives (including the function itself as its 


14.9. CHARACTERIZATION OF MOMENT GENERATING FUNCTIONS 267 


own derivative of order 0) are nonnegative on the interval and all its odd-order 
derivatives are nonpositive on the interval. From the preceding discussion we see 
that moment generating functions of probability measures on Rt are completely 
monotone on (0,00) and continuous at 0 with the value 1 there. This fact and 
its converse constitute a portion of the next theorem. 


Theorem 26. A function y: R > (0,00) is the moment generating function 
of a probability measure on R™ if and only if y is completely monotone on (0,00), 
continuous at 0, and (0) = 1, in which case the corresponding probability mea- 
sure is the limit of the sequence (Qn: n = 1,2,...), where Qn is the probability 
measure supported by {E; k € Zt} and given by 


n) gC) 
(14.5) Qn{k/n} = a. keZt, 
where y'*) denotes the k derivative of p when k = 1,2,..., and ọ itself when 
k=0. 


PROOF. The discussion preceding the theorem proves the ‘only if’ assertion. 
To complete the proof assume that y is completely monotone on (0,00) and 
continuous at 0 with the value 1 there. It then follows that (14.5) defines a 
o-finite measure on {£: k € Z+}. 

Let 

Pals) = p(n- sn), O<s<l. 
From the continuity and complete monotonicity of y, we conclude that pn is 
continuous with nonnegative derivatives on (0,1). By Theorem 14 of Chapter 5 
and the continuity of pn at 1, we conclude that 


œ (k) k © f mik alk) k 
_SpPOst — A (ntoh (ns 
pome oa a 
k=0 k=0 
Since the left side equals 1 when s = 1, so does the right side. Thus, (14.5) 
defines a probability measure, the moment generating function of which can be 
obtained by inserting e~“/” for s: 


u ~ p(n(1—e7%/")). 


As n — oo, this function converges to y. An appeal to Theorem 19 completes 
the proof. O 


Problem 55. State and prove an analogue of the preceding theorem for probability 
measures on R` and their moment generating functions. 


Problem 56. Prove that a bounded completely monotone function on (0,00) is 
either constant or else has the property that neither it nor any of its derivatives is 
0 in the interval. Can the hypothesis of boundedness be removed? Hint: Use the 
preceding theorem. 


268 14. CONVERGENCE IN DISTRIBUTION ON THE REAL LINE 


Problem 57. Let y and w be two continuous functions on [0,00) that are com- 
pletely monotone on (0,co). Prove twice that yy is completely monotone on 
(0, co) using different methods. 


Problem 58. Show that the moment generating function of any gamma distribu- 
tion on Rt has the form 


2 fe w(t) dt | 


ume 0<u<o, 


for some completely monotone function w on (0, 00). 


The next theorem gives a general framework into which the preceding problem 
fits. 


Theorem 27. There is a one-to-one correspondence between the set of in- 
finitely divisible distributions on (0,00) and the set of completely monotone func- 
tions y% on (0,00) having finite improper Riemann integral on (0,1]. The distri- 
bution corresponding to w ts the one whose moment generating function is 


(14.6) me ae O<u<ow. 


(The integral is improper because (0) is not necessarily defined; it may or may 
not be the case that (0+) < œ in which case w(0) could be defined in order to 
change the improper integral into a proper integral.) 


PROOF. Let y be a completely monotone function on (0,00) whose integral 
near 0 is finite, and denote by y the function (14.6). It is clear that y(0) = 1 
and y is continuous at 0. We will prove by mathematical induction that the mt” 
derivative of y is the product of a completely monotone function and (—1)’’y. 
This is obvious for m = 0 and we assume it be true for m = k. Thus, yp) = 
ny (—1)*y, where nk is completely monotone on (0,00). Then 


pEtD = meab(—1)*t1y + [-m](-1)***y. 


By Problem 57, 7% is completely monotone. Clearly, —n, is completely mono- 
tone. The observation that the sum of completely monotone functions is com- 
pletely monotone completes the induction proof. In particular, py” is nonneg- 
ative if m is even and nonpositive if m is odd. Theorem 26 thus applies to show 
that y is the moment generating function of a distribution on R+. 

The n* root of y is 


_ “ VU) dt 
u~e Í ” , O0<u< oœ, 


which is also the moment generating function of a distribution on R*, because 
g is completely monotone. Therefore vy is infinitely divisible. 

For the converse assume that y is the moment generating function of an 
infinitely divisible distribution on R. Consider the functions 


(14.7) u ~ n([p(u)]'/” — 1) 


14.10. CHARACTERIZATION OF CHARACTERISTIC FUNCTIONS 269 


for various positive integers n. By Theorem 26 the function u ~~ [y(u)]!/” is 
completely monotone for each n. So, for m > 1, the m™ derivative of (14.7) is 
nonnegative or nonpositive according as m is even or odd. The limit as n > oo 
equals — log oy, so we hope that for m > 1, the mt! derivative of — log oy is 
nonnegative or nonpositive according as m is even or odd. Were this the case, 
we could complete the proof by setting ~ = [log oy]’. Even though derivatives of 
limits do not in general equal limits of derivatives, an induction argument based 
on the fact that the various derivatives of the approximating functions are each 
of constant sign can be used to show that the above-mentioned hope is fulfilled 
in the situation at hand. O 


14.10. Characterization of characteristic functions 


Theorem 13 of Chapter 13 identifies those functions that are characteristic func- 
tions of Z-valued random variables, and Theorem 14 of the same chapter identi- 
fies those functions with finite Lebesgue integral that are characteristic functions 
of R-valued random variables. In order to identify all functions that are char- 
acteristic functions of R-valued random variables, we need a fact about positive 
definite functions. 


Lemma 28. Every positive definite function is bounded. 


PROOF. Let 8 be positive definite. By positive definiteness, (0 — 0)zz > 0 
for all z € C. Hence G(0) > 0. 
Again using the definition of positive definiteness, we obtain 


(14.8) B(0 —0)2,2, + B(0 — v) 222, + 8(v — O)21Z2 + Blu — v)ze% > 0. 


By setting z1 = 1 and z2 = 1 + ct, c € R, and using the fact that 8(0) is real, 
we conclude that (1 + ci)G(—v) + (1 — ci) (v) is real. By considering c = 0 and 
then c = 1, we see that 6(—v) = B(v). 

Now we use (14.8) four times with z1 = 1: once with z2 = 1; then with 
zə = —1; third with z2 = i; and last with z2 = —i. We deduce that the following 
four numbers are less than 23(0): twice the real part of —G(v); twice the real 
part of G(v); twice the imaginary part of @(v); and twice the imaginary part of 
—8(v). We conclude that |G(v)| < 8(0)V2 for all v and that, therefore, 8 is a 
bounded function. O 


Theorem 29. A function B: R — C is the characteristic function of some 


R-valued random variable if and only if it is continuous and positive definite, 
and satisfies B(0) = 1. 


PROOF. The content of Proposition 12 of Chapter 13 is that a characteristic 
function is continuous, positive definite, and has the value 1 at 0. For the 
converse, we assume that 7 is continuous and positive definite and that 6(0) = 1. 

For b > 0, let 


(v) = el B(v). 


270 14. CONVERGENCE IN DISTRIBUTION ON THE REAL LINE 


By Lemma 28, 8 is a bounded function and, therefore, each -y, has finite integral. 
Recalling that v ~ el”! is the characteristic function of a Cauchy distribution, 
we write 


(v) = f Sel o ee 


agg 0b? a 
Hence, for any choice of n and v1,...,Un E R and 21,..., 2n EC, 
n n 
S D plor = tj)eZk 
k=1 j=1 
= / Ok Pa) 63 S Alr- gee) | dz. 


k=1 j=1 
The integrand is nonnegative because 8 is positive definite. Hence, the integral 
is nonnegative and, therefore, Yẹ is positive definite. Clearly, yẹ inherits the 
properties of being continuous and having the value 1 at 0 from £2. 

On the basis of the preceding paragraph we may apply Theorem 14 of Chap- 
ter 13 to conclude that each is a characteristic function. Let b N 0 and apply 
Theorem 15 of this chapter to conclude that @ is a characteristic function. © 


Let @ be some continuous positive definite function for which 6(0) = 1. By 
the preceding theorem it is the characteristic function of some distribution Q on 
R. The inversion formulas given in Theorem 13 and Theorem 14 of Chapter 13 
conveniently describe Q in terms of ( in case ĝ is periodic or has finite integral. 
The above proof yields a description in general: Q = limax o Qb, where Qs is the 
distribution with density 

1 OO 

OT) cs 
We could have used characteristic functions (with finite integral) other than 
v ~ ell in the proof of the preceding formula, and so the preceding sentence 
remains valid for a wide variety of definitions of 7p. 


Problem 59. Find a distribution whose characteristic function is neither periodic 
nor has finite integral. Then check your conclusion by actually calculating the 
characteristic function. 


Problem 60. Let (¥,G, R) be a probability space. For each wy € YW, let By, bea 
characteristic function and assume, for each v, that y ~> Gy(v) is measurable. For 
each v € R set 


TE f By (v) (de). 
Wy 


Prove that y is a characteristic function. Describe a two-step experiment that is 
related to this problem. 


CHAPTER 15 


Distributional Limit Theorems 
for Partial Sums 


In this chapter we study convergence in distribution in settings involving se- 
quences (Sp: n = 1,2,...), where for each n, Sn = Xi +--+ + Xn is the nt? 
partial sum of a series of independent random variables. Our first result is that 
convergence in distribution of (Sn) is equivalent to a.s. convergence. Thereafter, 
we specialize to the case in which (X1, X2,...) is an iid sequence. Further limit 
theorems involving more general sums of independent random variables will be 
found in Chapter 16. 

For the case of iid summands, we first look at the convergence in distribution of 
S,,/n, and obtain necessary and sufficient conditions for a Law of Large Numbers: 
for all € > 0, 


(15.1) lim ||" -4 >l =0, 


NCO 


where c is some constant. Unlike the Strong Law of Large Numbers, this result 
applies in some cases in which the mean does not exist. In general, it is difficult 
to obtain good estimates for the rate of convergence in (15.1). But in certain 
special cases, we can derive very useful and precise information about this rate, 
the so-called ‘large deviations estimates’ treated later in the chapter. 


In Section 3, we examine the issue of convergence in distribution of (Sn — 
bn)/an for constants an > 0 and bn. If the summands Xx, have finite mean and 
variance the constants can be chosen to give the standard normal distribution as 
a limit, whatever the distribution of the summands. This result is known as the 
Classical Central Limit Theorem. Some related ‘local limit theorems’ are proved 
in the last section of the chapter. 

When the mean or variance is not finite, other limiting distributions known as 
‘stable’ distributions are possible limits for normalized sums of the form (Sn — 
bn)/an. Some basic results concerning convergence to stable distributions are 
contained in this chapter. Chapter 17 contains deeper results along these lines. 


272 15. DISTRIBUTIONAL LIMIT THEOREMS FOR PARTIAL SUMS 


15.1. Infinite series of independent random variables 


Let (Xx: k = 1,2,3,...) be a sequence of independent R-valued random vari- 
ables and denote the characteristic function of Xp by k. When we say that 
the infinite series } `}; Xx converges in distribution, we mean that sequence of 
partial sums (Sp: n = 1,2,...) converges in distribution as n — oo. By the 
independence of the summands and the Continuity Theorem of Chapter 14, con- 
vergence of this sequence is equivalent to the convergence of the infinite product 
of characteristic functions to an appropriate limit: 


(15.2) [[ &) =), veR, 
k=1 


for some function y that is continuous at 0. In case y is such a function, ~y is the 
characteristic function of the limiting distribution. We will prove that in this 
case, the series $`}; X, actually converges almost surely to a random variable 
S. Thus, for infinite series of independent random variables, a.s. convergence, 
convergence in probability, and convergence in distribution are equivalent. 


* Problem 1. Suppose that (X1, X2,...) is an independent sequence for which 


P[X, = —m6-*] = P[X, = m6™"] =} for m=1,3,5. 


Decide if pie Xx converges in distribution, and if so, calculate its characteristic 
function and its distribution. Also, decide if ` X; is almost surely convergent. 
Which of your conclusions remain valid if the independence assumption is dropped? 


Problem 2. Suppose that P[X, = —k7'/*] = P[X, =k7'/?] = 3. Decide if > X; 
converges in distribution. 


Theorem 1. Let (Xp: k = 1,2,...) be a sequence of independent R-valued 
random variables. Then v 
2 Xt 
k=1 


either converges in distribution, in probability, and almost surely, or else it di- 
verges in all three senses. 


PROOF. It is true in general that a.s. convergence implies convergence in prob- 
ability and therefore convergence in distribution. By Theorem 22 of Chapter 12 
convergence in probability of an infinite series of independent random variables 
implies its almost sure convergence. Thus it remains to prove that convergence 
in distribution implies convergence in probability, or equivalently, that conver- 
gence in distribution implies Cauchy in probability. For the sequence of partial 
sums Sn, Cauchy in probability means that for any sequence of positive integers 
Cine WH ce) 


(15.3) lim (Xntit::++Xnim,) = 0 i.p. 


15.2. THE LAW OF LARGE NUMBERS REVISITED 273 


We will show that convergence in distribution of the partial sums Sn as n — oo 
implies (15.3). 

As indicated earlier, convergence in distribution implies that (15.2) holds for 
some characteristic function y, where Bk denotes the characteristic function of 
X,. Thus, if v is a real number such that y(v) 4 0, then @,(v) Æ 0 for all k. 
Therefore, since y(0) = 1 and y is continuous, there exists an open interval B 
containing 0 such that neither y nor any ( is 0 at any point in B. Hence 


y(v) l 
lim Drw) = = “= = = 
ii i k limn—oo | [k=1 Be (v) limno | [k=n+m, +1 Pee) 
for v € B. 


The left side of this last expression is the characteristic function of P p17 KE 
which we denote by an. We have shown that an(v) > 1 as n > œ for every 
v € B. By Lemma 16 of Chapter 14, a,(v) > 1 as n > œ for every v E R. 


Hence, Daia Xp = 0. So (15.3) now follows from Proposition 4 of Chap- 
ter 14. O 


15.2. The Law of Large Numbers revisited 


For the remainder of the chapter, we focus our attention on normalized partial 
sums of iid sequences (Xn: n = 1,2,...). In this section, we concern ourselves 
with the convergence of +) p- Xx as n —> oo. Recall from Chapter 12 that 
there is almost sure convergence to a finite limit if and only if E(|X,|) < oo. 
Problem 42 of Chapter 13 and the following theorem show that this condition is 
not necessary for convergence in probability. 


Theorem 2. [Law of Large Numbers] Let 8 denote the common character- 
istic function of the terms of an tid sequence (Xn: n = 1,2,...) of R-valued 
random variables. Then 


(15.4) lim 


n—- CoO 


n 
aN , 
deka Xk eee i.p. 
n 


for some constant c € R if and only if B'(0) exists, in which case c = —ip' (0). 
PROOF. The characteristic function of + $>}; Xx is the function 


v ~ B"(u/n). 


In view of the Continuity Theorem and Proposition 4 of Chapter 14, the assertion 
of the theorem is the same as the following statement : 8'(0) exists if and only 
if there exists a real constant c such that 


S a i e” asn —> oœ, 


in which case 8’(0) = ic. 


274 15. DISTRIBUTIONAL LIMIT THEOREMS FOR PARTIAL SUMS 


By the continuity of 8, there exists some positive number b such that G(w) 4 0 
for |w| < b. For n > |v|/b we may write 


8" (2) = exp(n(log 08)(4)). 


(See Problem 7 of Appendix E for the definition of log of.) It will be useful to 
make the substitution w = +. Thus, 


(15.5) 8" (7) = exp(vz; (log 08)(w)) , 
provided that |w| < b. We also need the consequence 


my (08 28)(w) _ 
w—-0 Blw) — 1 
of Problem 13 of Appendix E and the comment following that problem. We 
rewrite the preceding equality as follows: 


(log of) (w 
w panes 
(15.6) jim so i 


It follows from (15.6) that if 6’(0) exists, then the limit as n => œ of (15.5) 
equals e?? (°) because for each fixed v, jw) = kl < b for all sufficiently large n. 
Thus, the ‘if’ part of the proof is completed, as is the assertion that ic = 8’ (0). 

To complete the ‘only if’ portion of the proof, we assume that 8"(7) > eck 
for all v € R as n > ov, and show that §’(0) = ic. Since e**” is a character- 
istic function, the Continuity Theorem implies that the convergence of 6"(2) 
is uniform for v € [—1,1]. Keeping in mind that 6(2) 4 0 for v € [-1,1] and 
sufficiently large n, we may take logarithms and conclude that 


(15.7) nlog of(*) > icv as n => oo, uniformly for v € [—1, 1]. 


For each w € [—1,1] there is a vp satisfying 1 — + < |vn| < 1 and a positive 
integer n such that w = +, and no matter how such choices are made, w > 0 
implies that n — oo. By the uniform convergence in (15.7), + log o@(w) > ic as 


w — 0. It now follows from (15.6) that 8'(0) = ic. O 


The Law of Large Numbers concerns the convergence of (S,,/n) to a constant. 
As we have already seen, for such a limit it does not matter whether we talk about 
convergence in distribution or convergence in probability. By the Kolmogorov 
0-1 Law, if we also insist on almost sure convergence, then convergence to a 
constant is the only possibility. But as the following exercise shows, in the 
case of convergence in distribution, it is possible for (S,/n) to converge to a 
nonconstant R-valued random variable. 


15.3. THE CLASSICAL CENTRAL LIMIT THEOREM 275 


Problem 3. In the setting of the preceding paragraph, suppose that the distribu- 
tion of X; is Cauchy with characteristic function v ~ exp(—|v|). Show that S,/n 
has the same distribution as X1, and so, 


Sn P, y. 


15.3. The Classical Central Limit Theorem 


In the Law of Large Numbers, convergence in distribution is obtained by dividing 
the nt? partial sum Sn by n. We turn more generally to the issue of convergence 
in distribution of sequences of the form 


(S:n); 
an 


where an and b, are constants. Special cases have already been treated in Prob- 
lem 54 and Problem 55 of Chapter 13. The following theorem is the most famous 
general result along these lines. 


Theorem 3. [Classical Central Limit] Let (Xn: n > 1) be an tid sequence of 
R-valued random variables having finite mean and finite nonzero variance. For 


each n, let 
Zn Z y Xk = nE(X,) l 
n Var(X) 
Then, for each x € R, 


lim P[Z, <2] = “#22 dy; 


1 x 
dim val 
that is, as n > œ, 
DIEE 
where Z is a normally distributed random variable with mean O and variance 1. 


PROOF. Let 8 denote the characteristic function of X,. Since X, has finite 
mean and variance, both 8'(0) and 8” (0) exist: 


p’ (0) = iE(X1), 
8” (0) = —E(X*). 


The characteristic function of $`}; X+ is 8”. Since E(X:) = —i@’(0), we 
conclude from Problem 7 of Chapter 13 that the characteristic function of Zn is 


the function 
es (=) e-v Vn B’ (0)/ y Var( X1) 
n Var(Xı) 
We will complete the proof by showing pointwise convergence of this sequence 
of functions to the characteristic function v ~> e~” /? of the standard normal 
distribution as n > oo. 


276 15. DISTRIBUTIONAL LIMIT THEOREMS FOR PARTIAL SUMS 


We use Problem 7 of Appendix E as we work with logarithms of the relevant 
quantities, noting that for fixed v, these quantities are nonzero for sufficiently 
large n. As n — œ, it follows from Proposition 3 of Appendix E that 


v ) _ vyn B'(0) 


n Var(X) 


Var(1) 
2 -a(1 - (aa) = (i-l m)) 


n7!) Z vyn B' (0) 
Var(Xı) 


n(log of) ( 
(15.8) 


+ no( 


We first compute the limit of the sum of the first and last terms, using Proposi- 
tion 3 of Appendix E: 


-n(1-8( aes) | _ vn b'O) 


n Var Var(Xı) 
| ETE eee E -1)) _ 2v2 8'0) 
E n( Gave” (0) m Vara) u ) Var(X1) 
> ew) as n — Co 
2 Var(Xı) 


For the limit of the second term in (15.8), we multiply and divide by v?/ Var( X1) 
and use the definition of ’(0): 


2 
, n U (v'(0))? 
lim ores 1 Ez g B Se, 
noo 2 ( a( seal 2 Var(Xı) 
Thus, the entire expression in (15.8) converges to 


Paid er OE AU N 


2 Var(X1) 2 í 


as desired. O 


Problem 4. Give two variations of the preceding proof: (i) by initially showing 
that there is no loss of generality in assuming that the summands have mean 0 and 
(ii) by using Proposition 2 of Appendix E and then applying Proposition 3 of that 
appendix to log of, rather than to log and 8 separately. 


Problem 5. Use the Classical Central Limit Theorem and facts about Poisson ran- 
dom variables to formulate and prove a convergence in distribution statement in- 
volving Poisson random variables with mean n. Then prove the same statement 
without using the Classical Central Limit Theorem. For each proof, decide whether 
the assumption that the means are Z-valued is relevant. Hint: Figure 15.1 shows, 


15.4. THE GENERAL SETTING FOR IID SEQUENCES 277 


-2.5 -20 -1.5 -1.0 -.5 0 5 1.0 1.5 2.0 2.5 


FIGURE 15.1. Normal and Poisson distribution functions 


with jumps filled in as a visual aid, the normal distribution function and a normal- 
ized Poisson distribution function both having mean 0 and and variance 1. The 
mean of the unnormalized Poisson distribution function (not shown) is 4. 


Problem 6. Use tables of the normal distribution, possibly within your calculator, 
to approximate the probability that there are at least 520 heads in 1000 flips of a 
fair coin. 


Problem 7. Let Z be a normally distributed random variable with mean 0 and 
variance 1. Let (Xx: k = 1,2,...) be an iid sequence of R-valued random variables 
and set Sn = J; Xx for each n. Suppose that 


aa 
Vne 


for some positive constant c and some sequence (bı, b2,...) of constants. Prove 
that X, has finite variance. 


15.4. The general setting for iid sequences 


As in previous sections, we consider the partial sums Sp = X1 +---+ Xn of an 
iid sequence (X,:n = 1,2,...). Denote the distribution of X, by Q. A general 
question in the spirit of Theorem 3 is: Do there exist positive constants a, and 
real constants bn such that ((S,;,—6,)/an: n = 1,2,...) converges in distribution 
to a nondegenerate limit? An alternative version of the same question is: Do 
there exist positive constants a, and real constants b, such that (Qn : n = 


278 15. DISTRIBUTIONAL LIMIT THEOREMS FOR PARTIAL SUMS 


1,2,...) converges to a nondegenerate limit, where Qn is defined by 
Qn(B) = Q*"(anB+ bn), B Borel. 


Here, ‘degenerate’ signifies a delta distribution when applied to distributions and 
an almost surely constant random variable when applied to random variables. 
By Theorem 24 of Chapter 14 we know that if the answer to the above question 
is ‘yes’, then the type of limit is uniquely determined. In that case we say that Q 
is in the domain of attraction of that limiting type. For instance, the Classical 
Central Limit Theorem says that all distributions having finite nonzero variance 
are in the domain of attraction of the normal type. It develops that there are 
distributions having infinite variance that are in the domain of attraction of the 
normal type, although Problem 7 indicates that scaling proportional to yn is 
not appropriate for such a distribution. Also, there are distributions that belong 
to no domain of attraction. Variations of terminology are used: the domain of 
attraction of a particular distribution really means the domain of attraction of 
the type to which that distribution belongs; and the terminology is also carried 
over to characteristic functions and random variables. 

Similar questions arise when no centering is permitted. Chapter 14 applies to 
show that a limit different from the delta distribution at 0 is unique up to strict 
type, and thus, apart from the strict type consisting of the delta distribution at 
0, each strict type has a well-defined domain of strict attraction, possibly empty. 
For instance, the Classical Central Limit Theorem says that all distributions 
having 0 mean and finite nonzero variance are in the domain of strict attraction 
of the normal distribution with mean 0 and variance 1. 

The following definition is relevant to the preceding paragraphs. 


Definition 4. Let Q be a distribution on R. The distribution Q is stable if 
Q*” is of the same type as Q for every n. It is strictly stable if Q*” is of the 
same strict type as Q for every n. 


The adjectives ‘stable’ and ‘strictly stable’ are used for random variables as 
well as for their distributions—and also for their distribution functions and char- 
acteristic functions (and moment generating functions when appropriate). 

It is clear that every distribution of the same type as a stable distribution is 
also stable, and that every distribution of the same strict type as a strictly stable 
distribution is also strictly stable. Accordingly, we may speak of a type being or 
not being stable and of a strict type being or not being strictly stable. 


Problem 8. Prove that every stable distribution on R is infinitely divisible. 
* Problem 9. Decide if the Poisson distributions are strictly stable. 
Problem 10. Decide if the gamma distributions are stable. 


Problem 11. Decide if the Cauchy distributions are stable. Which if any of them 
are strictly stable? 


15.4. THE GENERAL SETTING FOR IID SEQUENCES 279 


Proposition 5. The distributions on R that have nonempty domains of at- 
traction are the stable distributions. Those that have nonempty domains of strict 
attraction are the strictly stable distributions. 


PARTIAL PROOF. We treat only domains of strict attraction in this proof. 
The proof for domains of attraction is requested in Problem 12. Suppose that 
a distribution Q is in the domain of strict attraction of a distribution R. Let 
(Xk: k = 1,2,...) be an iid sequence with common distribution Q, Y a ran- 
dom variable with distribution R, and (an: n = 1,2,...) a sequence of positive 


constants such that r 
ae Xk Bees Y. 
an 
Fix a positive integer m and let (Y1, Y2,..., Ym) be a sequence of independent 
random variables each having distribution R. 
On the one hand oon 
Lai Xk D, y 
Amn 
as n — oo, and on the other hand, by Proposition 6 of Chapter 14, 


kn 
k=1 ^k TIE ljn“ D Yr 
k=1 


a a 
n k=1 n 


as n => oo. By Theorem 25 of Chapter 14, Y and >°/_, Yk are of the same strict 
type, and so, according to Definition 4, R is strictly stable. 

Now suppose that Y is strictly stable. Let (¥1,Y2,...) be an iid sequence 
of random variables each having the same distribution as Y. By the defini- 
tion of strict stability, there exists, for each n, a positive constant an such that 
a; Yı Yk has the same distribution as Y. Hence, 


ae ee, 
An 


Therefore, Y is in its own domain of strict attraction. O 


Problem 12. Complete the proof of the preceding proposition. 


In Chapter 17, the issue of characterizing stable distributions and identifying 
their domains of attraction will be treated. Here, as an introduction to this 
topic, are exercises concerning some stable distributions. 


Problem 13. Let a be a real number. Show that 
use” , we (0,00), 


is the moment generating function of a probability distribution on R™ if and only 
if 0 < a < 1, in which case the corresponding probability distribution is strictly 
stable. Hint: Use Theorem 27 of Chapter 14. 


280 15. DISTRIBUTIONAL LIMIT THEOREMS FOR PARTIAL SUMS 


* Problem 14. Identify the strictly stable type obtained by setting a = 1 in the 
preceding problem. 


Problem 15. Use the Law of Large Numbers to identify many distributions in the 
domain of strict attraction of the type identified in the preceding problem. 


* Problem 16. [Strictly stable distribution on R* of index 4] Show that the strictly 
stable distribution on Rt obtained by setting a = 5 in Problem 13 has a density 
with respect to Lebesgue measure of the form 


2 


a _a 
I~ e Ze 
V 27x? 


for some positive constant a. Also, find a. 
Problem 17. Calculate the moments of the distribution in Problem 16. 
Problem 18. Which of the distributions of Problem 13 have finite expectations? 


Problem 19. Let a € (0,1] and let (X1, X2,...) be an iid sequence of strictly 
stable R*-valued random variables having the moment generating function given 
in Problem 13. Find a, for each positive n so that = > p=; Xe has the same 
distribution as X1. 


15.5. + Large deviations 


The Law of Large Numbers implies that if (Xn: n = 1,2,...) is an iid sequence 
of R-valued random variables with finite mean, then for all € > 0, 


PIS Xs > n[E(X1) +e]] > 0 as n > æ. 
k=1 


The event in this expression could be viewed as a ‘large deviation’ by the partial 
sum Sn = Xı +--+ Xn above its mean, which is nE(X,). A similar statement 
holds for the probability of large deviations below E(S,,), that is, for the event 
that Sn is less than n(£(X,) — £). Our original proof of the Law of Large 
Numbers in Chapter 5, which used the Chebyshev Inequality, provided us with 
an upper bound for the probability of large deviations, but this bound is not 
very good in many cases. Our more recent proof, using characteristic functions, 
was designed to include some cases in which E(.X,) does not exist. In terms 
of providing useful upper bounds for large deviations probabilities, this second 
method is even worse than the first. 

It turns out that under an additional assumption on the distribution of Xj, 
we can use the Markov Inequality to obtain vastly improved large deviations 
estimates. The extra assumption needed is that there exists a constant a > 0 
such that 


(15.9) E(e*2!) < œ. 


15.5. LARGE DEVIATIONS 281 


Throughout this section, we will let 

y(b) = E(e’*?). 
Note that if a > 0 is such that (15.9) is satisfied, then (b) exists and is finite 
for all b € (0, a]. 


Theorem 6. {Large Deviations] Let (Xn: n = 1,2,...) be an iid sequence 
of R-valued random variables satisfying (15.9) for some constant a > 0. Then 
for all b € [0,al, 


(15.10) P Sie Er) ob) 


5 Xp > n(E(Xı) +£) 
k=1 


Furthermore, b € [0,a] may be chosen so that 
(15.11) e HEF) 015) < 1. 


PROOF. We rewrite the left side of (15.10) as 


P 


NO[X: — E(Xx)] > ne ; 
k=1 
and then apply the Markov Inequality (with f(z) = e°*) to obtain the upper 


bound 7 
heile diye Xe PUA) 


Since the random variables X, are independent, the expectation in this expres- 
sion factors by Corollary 12 of Chapter 9, and (15.10) follows. 

To prove the rest of the theorem, we calculate the right-hand derivative of 
the left side of (15.11) with respect to b at b = 0. The result is —(F(X 1) + 
€) + E(X,) = —e. (The reader may check that the differentiation inside the 
expectation is justified by splitting X, into positive and negative parts, and then 
applying the Monotone and Dominated Convergence Theorems. The argument is 
similar to that used to differentiate moment generating functions in Chapter 13.) 
Since this derivative is negative, and since the left side of (15.11) equals 1 when 
b = 0, there exists a b € (0,a] such that (15.11) is satisfied. O 


* Problem 20. Find the minimum value of the left side of (15.11) in each of the 
following cases: (i) Xi has a Bernoulli distribution with parameter p; (ii) X1 has 
a standard exponential distribution; (iii) X, has a standard normal distribution. 


Problem 21. Use Theorem 6 to give an alternate proof of the Strong Law of Large 
Numbers for an iid sequence that satisfies (15.9) for some a > 0. 


It turns out that the minima found in Problem 20 provide, in a certain sense, 
the sharpest possible large deviations estimates for the distributions treated in 
that problem. We will not pursue this matter in this book. 


282 15. DISTRIBUTIONAL LIMIT THEOREMS FOR PARTIAL SUMS 


15.6. + Local limit theorems 


The Classical Central Limit Theorem may be restated briefly as follows: if Fn 
is the distribution function of [Sn — nE(X )]/\/n Var(X1), where Sn is the nt? 
partial sum of an iid sequence (X+), then (Fn) converges pointwise as n —> œœ 
to the standard normal distribution function. Thus, it is a result about the 
convergence of sequences of distribution functions. The main results of this 
section concern the pointwise convergence of sequences of density functions. Such 
results are known as ‘local limit theorems’. 

We will state and prove local limit theorems for two important cases, essen- 
tially corresponding to the two cases for which we obtained inversion formulas 
in Chapter 13. In each case, our proof will involve showing that we can move 
the relevant limit outside the integral sign in the inversion formula. 

We begin with the case in which the distributions under consideration have 
a density with respect to Lebesgue measure on R. For this case, we need two 
preliminary results concerning the characteristic function of a distribution that 
has a density. 


Lemma 7. [Riemann-Lebesgue] Let Q be a probability measure on R with 
characteristic function 8. Suppose that Q has a density f with respect to Lebesgue 
measure on R. Then limy+.o Bv) = 0. 


PROOF. We will actually show that if f: R > R is a nonnegative Borel mea- 
surable function such that f f(x) dz < oo, then 


(15.12) jim f(z) dr = 0. 

It is easily checked using elementary calculus that (15.12) holds when f is 
the indicator function of a bounded open interval. Since every open subset of R 
is a countable union of pairwise disjoint open intervals, it is also easy to prove 
(15.12) when f is the indicator of a bounded open subset of R. 

Now suppose that f = I4, where A is a bounded Borel subset of R. Since 
Lebesgue measure is regular (see Problem 14 of Chapter 7), for each e > 0, there 
exists an open set B D A such that A(B \ A) < €e, where A is Lebesgue measure 
on R. It follows that 


limsup| | f(x)e"* dz| < lim || ae dr| + (B\ A) <e. 
v=o a ae 


Thus (15.12) holds when f is the indicator function of a bounded Borel subset 
of R. The extension to bounded Borel measurable functions f that vanish off a 
bounded set is now routine. 

To complete the proof, let f be an arbitrary nonnegative Borel measurable 
function such that f f(x)dx < oo. By the Monotone Convergence Theorem, 


15.6. LOCAL LIMIT THEOREMS 283 


f f(z) dx = limn f fn(z) dz, where f,(z) = [f(z) A n]Ii-n,n] Thus, for € > 0, 
there exists an integer n such that f |f(x) — f,(a)| dz < e. For such n, 


lim sup| | f(2)e" dz | < lim |f fe dz} + f If) — fa(z)|dz <€, 
VCO v [o@) 
and (15.12) follows. O 


The second preliminary result is an improvement on Problem 56 of Chapter 13: 


Lemma 8. [Parseval Formula] Let Q be a probability measure on R with char- 
acteristic function B. Suppose that Q has a density f with respect to Lebesgue 
measure on R, and that f f?(x) dr < œ. Then 


[Peo dz = = f BOP ae. 


PROOF. Let 
g(a) = / foie w= CFP a 


where f is the function z ~ f(—x). Note that g is the density of the symmetriza- 
tion of Q, and that the characteristic function of g is |8|? (see Problem 17 of 
Chapter 13). Also note that 


9(0) = | P(e) dz. 


By the Cauchy-Schwarz Inequality and the definition of g, 


Pl) < [| Poa] [f Perna] =|| Poa], 


so g is bounded. 
We wish to show that g is continuous. Again applying the Cauchy-Schwarz 
Inequality, we have 


(g(x) — g(2)} < [/ f”) dy| [fe +y) — f(zt+y))? dy]. 


After making the change of variables y ~~ y — z, we see that in order to prove 
the continuity of g, it is enough to prove that 


(15.13) im | [f(a +y) - Fy)P ay =0. 


The proof of (15.13) is similar to the proof of (15.12). It is easy to see that 
(15.13) holds if f is the indicator function of a bounded open set, and the exten- 
sion to indicator functions of bounded Borel sets is also straightforward. In order 
to treat simple functions f that vanish off a bounded set, expand the integrand 
as a finite sum of terms of the form a,;[fi(z+y) — fi(x)|[f;(a+y) — f;(x)], where 
each aj; is a constant, and each f; is the indicator function of a bounded Borel 
set. For the terms with i = j, apply the result of the previous step. For the terms 


284 15. DISTRIBUTIONAL LIMIT THEOREMS FOR PARTIAL SUMS 


with ¿ Æ J, use the result of the previous step together with the Cauchy-Schwarz 
Inequality. The extension to bounded Borel measurable functions f that van- 
ish off a bounded set is done by approximating such functions uniformly with 
measurable simple functions. And finally, the extension to nonnegative Borel 
functions f such that f f?(x) dz < œ is straightforward. 

We have shown that g is a continuous bounded function whose value at 0 
is f f*(x) dz. For 6 > 0, let hs be the continuous density function of the nor- 
mal distribution with mean 0 and variance 6, and let ys be the corresponding 
characteristic function. Then g * hs has characteristic function |G|?y5. This 
characteristic function has finite integral with respect to Lebesgue measure on 
R, so the Inversion Formula, Theorem 14 of Chapter 13, applies to give 


(g + hs)(0) = 5 | 8(w)P vale) av. 


As 6 goes to 0, the right side of this equation goes to + f |G(v)|? du by the 
Monotone Convergence Theorem. Since g is bounded and continuous, the left 
side is easily seen to converge to g(0) as 6 goes to 0. Since g(0) = f f?(x) dz, 
the proof is complete. O 


We are now ready to state and prove our first local limit theorem: 


Theorem 9. [Local Limit—Continuous Case] Let (Xn: n = 1,2,...) be an 
iid sequence of R-valued random variables with finite mean u and finite variance 
o? > 0. Suppose that the distribution of X, has density f with respect to Lebesgue 
measure, and further suppose there exists a positive integer k such that 


[tte] a <œ. 


Let pn be the density with respect to Lebesgue measure of 


Xi +... Xn- npu 


3 
no? 


and let p be the density with respect to Lebesgue measure of the standard normal 
distribution. Then 


lim ap IPr(z) — p(x)| = 0. 

PROOF. For simplicity, we assume that u = 0 and g? = 1. The generalization 
to arbitrary finite u,øg? is routine. Let 8 be the characteristic function of Xj. 
Thus, for n = 1,2,..., the characteristic function corresponding to f*” is 8”. 
The assumption that f[f**(x)]? dz < oo and Lemma 8 imply that f |B(v)|?* du < 
oo. Since || < 1, it follows that f |G(v)|" dv < oo for all n > 2k. 

The characteristic function corresponding to pn is v ~ B"(v/./n). By the ar- 
gument in the preceding paragraph, this function has finite integral with respect 


15.6. LOCAL LIMIT THEOREMS 285 


to Lebesgue measure on R when n > 2k. For such n, the Inversion Formula, 
Theorem 14 of Chapter 13, gives 


p (2) = > [seine dv, TER. 
T 
The same Inversion Formula gives 
1 . 
plz) = [ower dv, xE€ER, 


where y is the characteristic function of the standard normal distribution. Thus, 
it is enough to show that 


(15.14) ia J Be) Jao) G0 = 6. 
n—+00 
We will break the integral in (15.14) into three pieces: 
(15.15) f Po- ee) dv; 
ju] <An 
(15.16) / 18" (vf Vn) — glo) | dv; 
AnS|vlSBVa 
(15.17) [| PN- podo. 
I>BVA 
The constants A,,n = 1,2,..., and B are to be chosen later. 


By the Classical Central Limit Theorem and the Continuity Theorem, the 
integrand in (15.15) goes to 0 uniformly on bounded intervals as n > oo. It 
follows that if the sequence of constants (An: n — oo) increases to oo slowly 
enough, the integral in (15.15) converges to 0 as n => oo. Choose (An: n = 
1,2,...) to be any sequence with this property. To avoid notational technicalities 
with the limits of integration, we may assume that A,/./n — 0 as n > œ, so 
that no matter how B > 0 is chosen, we have A, < Byn for large n. 

The integral in (15.17) is bounded above by 


(15.18) J/n|B(v)|" du + J lyw) dv. 
|v|>B lvul>/nB 

The second term in this expression goes to 0 as n — oo by the Dominated Con- 
vergence Theorem. By Problem 50 of Chapter 13, |G(v)| < 1 for v Æ 0, so since 
8 is continuous, the Riemann-Lebesgue Lemma implies that sup),), p [G(v)| < 1 
for any B > 0. Therefore, for any fixed B there exists an integer l such that the 
sequence (./n|G(v)|": n > 1) decreases for |v] > B. Since f |8|” dv < oo for all 
n > 2k, the Dominated Convergence Theorem implies that the first integral in 
(15.18) goes to 0 as n > œ for any B > 0. 


286 15. DISTRIBUTIONAL LIMIT THEOREMS FOR PARTIAL SUMS 


It remains to show that the integral in (15.16) goes to 0 as n > oo for Ay 
as chosen earlier and for some B > 0. We apply the inequality in (E.10) of 
Appendix E to (v) = E(e*?): 

2 X 3 
Bw) -1+ > < peor A X?v?] = v?A(v), 
where h(v) is a nonnegative function that goes to 0 as v — 0 by the Dominated 
Convergence Theorem. Choose B small enough so that if |v| < B, then h({v) < 
1/4 and v? < 2. For such v we then have 


It follows that for |v] < Byn, |B(v/Vn)|" < e7724, so (15.16) is bounded 


above by 
2 | ev /4 dv , 
An<|v]<BYn 


which is easily seen to go to 0 as n > œ. D 


Problem 22. Show that the hypothesis JIE (£)? dx < œ is satisfied for some 
positive integer k by every density f introduced so far in this book (see the index 
for a list). For which of these distributions does the conclusion of the Local Limit 
Theorem hold? 


Problem 23. Let 
1 


f(x) = < 2z| log? |z| 
0 otherwise . 


fox hee 
e 


Show that f is a density with respect to Lebesgue measure of some distribution 
Q on R with finite mean and variance, but that the conclusion of the Local Limit 
Theorem does not hold for Q. 


We now turn our attention to a local limit theorem for the case in which 
the iid random variables X1, X2,..., are Z-valued. Since we will be scaling and 
centering, it turns out to be convenient to work in a more general setting. 


Definition 10. A lattice in R is a set L of the form 
L = {az +b: z € Z}, 


where a > 0 and b are real constants, known respectively as the span and shift 
of the lattice L. If b = 0, L is a centered lattice . A distribution Q on R is a 
lattice distribution with span a if the support of Q is contained in a lattice with 
span a and is not contained in a lattice with larger span. A lattice distribution 
Q is a centered lattice distribution if 0 is a member of the support of Q. 


15.6. LOCAL LIMIT THEOREMS 287 


Note that since we insist that the span of a lattice be real (oo is not allowed), a 
single point is not a lattice, and a delta distribution is not a lattice distribution. 
In other words, lattice distributions are nondegenerate. Also note that the span 
of a distribution on Z is not necessarily 1. For example, the distribution 5 (6-1 + 
ôi) has span 2. 


Problem 24. Let Q be a distribution on R with characteristic function 8. Show 
that Q is a lattice distribution if and only if the quantity 


A = inf{v > 0: |B(v)| = 1} 


is positive and finite, in which case 27/X is the span of Q. Also show that if À is 
positive and finite and @(A) = 1, then Q is a centered lattice distribution and ĝ is 
periodic with period À. 


Problem 25. Show that if Q is a lattice distribution with span a, then so is Q*” 
for n = 1,2,.... Hint: The statement is easy to prove if “lattice” is replaced by 
“centered lattice”. 


Lattice distributions do not have densities with respect to Lebesgue measure, 
so a local limit theorem for lattice distributions will necessarily be worded some- 
what differently than one for distributions with densities. Let (Xn: n = 1,2,...) 
be an iid sequence of R-valued random variables with finite mean yp, finite vari- 
ance g? > 0, and distribution Q. Assume that Q is a lattice distribution with 
span a. Thus, it is supported by L = {ax +b: x € Z} for some real constant b. 
For each n, let 
Xyt---+Xy— nyu 

no? l 
Then the support of Zn is contained in the lattice 


Zn = 


a n(b- p) 
Lrs t= EAk 
{ no? Vno? ) 


Let mn denote counting measure on L,, and set 


Since An approximates Lebesgue measure in a certain sense (see Problem 27), 
we might hope that the density of Z, with respect to A, will approximate the 
standard normal density in a certain sense, and the following result justifies this 
hope: 


Theorem 11. [Local Limit—Lattice Case] For n = 1,2,..., let Zn, Ln, and 
An be as in the preceding paragraph, and let py: Ln — Rt be the density of the 
distribution of Z, with respect to An. Then 

lim sup [pn(«) — p(x)| =0, 


noo zELy 


288 15. DISTRIBUTIONAL LIMIT THEOREMS FOR PARTIAL SUMS 


where p is the density with respect to Lebesgue measure of the standard normal 
distribution. 


Problem 26. Prove the preceding theorem. Hint: Assume p = 0,0” = 1 and let 8 
be the characteristic function of X,. Use the Inversion Theorem, Theorem 13 of 
Chapter 13, appropriately modified for the lattice case, to show that 


1 arf/nfa se 
Palir) = x | eB" (u/J/n) dv. 
—nJ/n/a 


Then follow the proof of Theorem 9. 


Problem 27. State and prove a precise statement about the sense in which A, 
approximates Lebesgue measure on R. 


* Problem 28. Let Sn be a simple random walk on Z with expected step size 2p — 1. 
Restate Theorem 11 as a statement about the asymptotic behavior of the quantities 
P[Sn = z], £ E€ Z, as n > œ. 


Problem 29. Describe the relation between the sizes of the jumps of the Poisson 
distribution function in Figure 15.1 and the values of the Poisson density shown in 
Figure 15.2. In this figure, the densities of Theorem 11 are shown for n = 4 and 
X; Poisson distributed with mean 1. 


-2.5 -20 -1.5 -1.0 -.5 0 2 1.0 1.5 2.0 2.) 


FIGURE 15.2. Normal and Poisson densities 


CHAPTER 16 


Infinitely Divisible Distributions 
as Limits 


Recall that an infinitely divisible distribution is one that, for each n, is equal 
to Q;*" for some distribution Qn. In preceding chapters several infinitely divis- 
ible distributions have appeared. In particular all the stable distributions are 
infinitely divisible, as is easily seen by comparing the definitions of these two 
concepts. In this chapter we characterize all infinitely divisible characteristic 
functions. This characterization is based on the family of ‘compound Poisson 
distributions’, to be introduced in the first section. 

Infinitely divisible distributions arise naturally as limits of sequences of dis- 
tributions of sums of independent random variables. The relevant story is told 
in this chapter with some repetition, starting with some important special cases 
and then moving on to more general cases that encompass the earlier special 
cases. The repetition is warranted since in the most general case the story is 
quite intricate. 

In the Rt -setting, there are fewer complications than in the R-setting, espe- 
cially if moment generating functions are used rather than characteristic func- 
tions. This issue is treated and the important extension to the R’ -setting is also 
examined. 


16.1. Compound Poisson distributions 


When speaking of a Poisson random variable we typically mean a standard Pois- 
son random variable, a Z*-valued random variable that has positive probability 
of equaling 1. Accordingly, we introduce the terminology generalized Poisson 
for random variables that are nonzero (possibly negative) multiples of standard 
Poisson random variables. And, of course, we use the same adjective for corre- 
sponding distributions and characteristic functions. 

Let v be a finite measure with finite support {z;: j = 1,2,...,m} C R \ {0}. 
For each fixed j, the function u ~ exp[—(1—e?“*)v({x; })] is a generalized Pois- 
son characteristic function. The corresponding distribution has support equal to 
the set of nonnegative integral multiples of z;. It follows that the product over 


290 16. INFINITELY DIVISIBLE DISTRIBUTIONS AS LIMITS 


Jj =1,2,...,m of these characteristic functions, which may be written as 
(16.1) un exp(- 1 (1 — et?) v(dy)) 
R\{0} 


is the characteristic function of a sum of m independent generalized Poisson ran- 
dom variables. By an easy limiting argument, using the Continuity Theorem, 
we see that we still have a characteristic function even if we drop the require- 
ment that the finite measure v have finite support. A distribution is compound 
Poisson if its characteristic function has the form (16.1) for some finite measure 
v (possibly the zero measure) on R \ {0}. 

Replace v by iy in (16.1) to obtain a compound Poisson characteristic func- 
tion whose nt}? power is the characteristic function (16.1) as it stands. Therefore, 
all compound Poisson distributions are infinitely divisible. 

Another feature that makes compound Poisson distributions central for this 
chapter is the way they arise as limits. We have already seen in Problem 4 
of Chapter 14 that a sum of a large number of independent Bernoulli random 
variables, each having small probability of equaling 1, can have a distribution 
that is close to Poisson. In the following example we generalize this fact. 


Example 1. Fix a distribution R. Let constants 


Prin € [0,1], l1<k<n,n=1,2,..., 


satisfy 
n 
ee S Se. a 
for some constant À € (0,00). For 1 < k < n, n = 1,2,..., we let 


Qin = (1 — Pr,n)ôo + Pr n P 
and set the goal of deciding if 
(16.2) lim (Qin * Q2,n * ++ * Qn,n) 
exists, and if so of evaluating it. 


In terms of characteristic functions the limit of interest is 


n> o0 


lim I] ((1 — pen) + PenY(v)) ; 
k=1 


where y is the characteristic function of R. We introduce well-defined logarithms 
as described in Problem 7 of Appendix E, and consider 


X log(1 — pr n (1 — y(v))) - 


lim 
n— o0 
k=1 


16.1. COMPOUND POISSON DISTRIBUTIONS 291 


Since |1 — y(v)| < 2 we see that the error in using only the first nonzero term of 
the power series for the logarithms is no more than 


n n 
o( de. | < O( (max, Pen) © prn) | 30 asnoOow. 
k=1 Sa k=1 


Hence, the issue has become: Does 


n 


li — — 

lim $ ~Pen(1 — y(v)) 
k=1 

exist, and if so, what is its limit? It is clear from our assumptions that the limit 


does exist and is equal to 


Ates J. ETEO RA). 


Therefore, the limit (16.2) exists and equals the compound Poisson distribution 
whose characteristic function is 


um exp(- Jt — tv) (AR)(dy) 


* Problem 1. [Law of Rare Events] Let Akn, 1 < k < n,n > 1, be events ina 
probability space (Q, F, P) and suppose for each n, that (Agn:1<k <n) is an 
independent n-tuple. Also, suppose that 


lim ( max P(Az,n)) =0 and lim NO Awe = à € (0,0). 
k=l 


n—oo ‘1l<k<n 


Let Yn = J p1 Arn- Prove that (Yn: n 
standard Poisson random variable with mean 


> 1) converges in distribution to a 
À. 


Problem 2. Explain how Problem 4 of Chapter 14 can be regarded as a special 
case of the preceding problem. 


Problem 3. In Example 1 we assumed A € (0,00). What new issues, if any, arise 
if A = 0 is permitted. 


Example 2. Let Y}, k = 1,2,..., and M be independent random variables 
and suppose that the distribution of M is standard Poisson with mean 4 and 
that the Yą have a common distribution R. Set Sọ = 0 and, for m = 1,2,..., 


292 16. INFINITELY DIVISIBLE DISTRIBUTIONS AS LIMITS 


set Sm = >>, Yr. Let us calculate the distribution Q of the random variable 
w ~~» SM(w) lw). For any Borel set B we have 


Q(B) = S eih € B| = S5 PIM =m] PlSin € B] 


m=0 m=0 


that is, 
CO. bàm 
_ e “X ae, 
(16.3) Q= 5) aa 
m=0 
Using the Fubini Theorem, the calculation of the characteristic function 8 of 


Q in terms of the characteristic function y of R, and thus in terms of R itself, is 
easy: 


= e7~*\™ m ~A(1—y(u)) — iu 
Btu) = Do maa) = e0 = exp(— Jap TEA ORLA) 


We see that Q has a compound Poisson distribution. In this example, unlike 
Example 1, we have obtained a formula for Q itself, not just a formula for its 
characteristic function. Of course, this formula is also valid for Example 1. 


We have seen that compound Poisson distributions can be represented as a 
limits of sequences of distributions of sums of independent generalized Poisson 
distributions, and also as the sum of a Poisson number of iid random variables. 
We have used characteristic functions to show the equivalence of these two types 
of representations. The following problem asks for a more direct approach. 


Problem 4. Use Problem 18 of Chapter 10, rather than characteristic functions, 
to relate the two types of representations just described. 


Problem 5. [Two-sided Poisson distributions] Let X and Y be iid standard Poisson 
random variables with mean p. Show that: (i) the distribution of X — Y is given 
by 

PIX -Y =k] =e? In (2p), kEZ, 


where 


co 
1 gy\rt2s 
i PO ee 
ed) i) 3 sT(s+r+1)\2 P 

with T denoting the gamma function; (ii) the characteristic function of the standard 
two-sided Poisson random variable X — Y is the function u ~ e~ 7?(!~ S74). and 
(iii) X — Y is compound Poisson. Also, identify the measure v, using the notation 


of (16.1). The functions I, are called modified Bessel functions of the first kind. 


16.2. INFINITELY DIVISIBLE DISTRIBUTIONS ON R 293 


* Problem 6. For p = 1 in Problem 5 use a table of Bessel functions to approximate 
to three decimal places the density of X — Y with respect to counting measure. 


Problem 7. Find a formula for the distribution of the difference of two independent 
standard Poisson random variables. Express the answer in terms of modified Bessel 
functions of the first kind and the means, not necessarily identical, of the two 
Poisson random variables. Also show that the difference is compound Poisson and, 
in the notation of (16.1), find the measure v. 


16.2. Infinitely divisible distributions on R 


We begin with a few examples involving stability and infinite divisibility. The 
major issue treated in this section is that of identifying in an explicit manner all 
infinitely divisible distributions on R. 


Example 3. The characteristic functions of the Cauchy distributions sym- 
metric about 0 are the functions u ~~ e~°!"! for positive a. Not only is each nt” 
root of such a characteristic function another such characteristic function, thus 
establishing the infinite divisibility of the Cauchy distributions, but all these 
characteristic functions are of the same strict type, thus establishing the strict 
stability of the Cauchy distributions symmetric about 0. 


Problem 8. Show that every distribution of the same type as those in the preceding 
example is strictly stable. (Comment: The reason the phrase ‘Cauchy type’ is not 
used in this problem is that ‘Cauchy’ is often also used for certain asymmetric 
distributions.) 


Problem 9. Which normal distributions are strictly stable, which are stable, and 
which are infinitely divisible? 


In order not to give a misleading picture we turn to some infinitely divisible 
distributions that are not stable. 


Example 4. The moment generating functions of the standard gamma dis- 
tributions are the functions v ~ (1 + v)~7. We see that, for every n, the nt? 
root of such a function is another function of the same form. This observation 
establishes the infinite divisibility of the gamma distributions. 


Problem 10. Show that the gamma distributions are not stable. 


The next two propositions indicate ways that ‘new’ infinitely divisible distri- 
butions can be obtained from ‘old’ infinitely divisible distributions. A similar 
approach was used in the first section to obtain compound Poisson distributions. 


294 16. INFINITELY DIVISIBLE DISTRIBUTIONS AS LIMITS 


Proposition 1. Linear combinations of independent infinitely divisible ran- 
dom variables are infinitely divisible. 


Problem 11. Prove the preceding proposition. 


Proposition 2. The limit of a sequence of infinitely divisible distributions is 
an infinitely divisible distribution. 


* Problem 12. Prove the preceding proposition. 


When working with the characteristic function of an infinitely divisible dis- 
tribution, it is often convenient to take the logarithm, which according to Prob- 
lem 7 of Appendix E can be defined uniquely in any interval about 0 in which the 
characteristic function does not have the value 0. The following lemma says that 
this interval can be taken to be R when the characteristic function is infinitely 
divisible. 


Proposition 3. The complex number 0 does not belong to the image of any 
infinitely divisible characteristic function. 


PROOF. Let @ denote an infinitely divisible characteristic function. For n = 
1,2,..., let Yn be a characteristic function having the property that 7,” = 8. In 
an interval J about 0 in which @ and, hence, yn are different from 0, we may 
take logarithms and divide by n: 


(16.5) log(Yn(u)) = ~ log(3(u)) , ued. 


Let n — oo to see that y,(u) > 1l as n —> co for u € J. By Lemma 16 of 
Chapter 14, y,(u) —> 1 for every u € R. In particular, for every u € R there 
exists an n such that yn(u) 4 0. It follows that G(u) #0 for every ue R. O 


We now know that J can be taken to equal R in (16.5), and thus we obtain 
the following corollary. 


Corollary 4. For any infinitely divisible characteristic function B and any 
positive integer n, there is exactly one characteristic function Yn having the prop- 
erty that y? = B. 


In view of the preceding corollary we may, without ambiguity, write gin 
for the characteristic function called yn in the corollary, provided that ĝ is an 
infinitely divisible characteristic function. If Q is an infinitely divisible distri- 
bution, there is, for each n, a unique distribution Qn such that Q = QF"; we 
call Qn the n*® convolution root of Q. Proposition 3 and Corollary 4 have the 


following easy consequence. 


16.3. LEVY-KHINCHIN REPRESENTATIONS 295 


Corollary 5. As n > œ, the n™ convolution root of an infinitely divisible 
distribution converges to the delta distribution at 0. 


The following proposition shows that compound Poisson distributions are 
more than just examples of infinitely divisible distributions. 


Proposition 6. Every infinitely divisible distribution is the limit of some se- 
quence of compound Poisson distributions. 


Proor. Let 8 be an infinitely divisible characteristic function and let Qn 
denote the distribution corresponding to 6!/". By Corollary 5, 81/"(u) > 1 as 
n — œ for all u € R, so 


log 8 ”(u) 
n (1 — Anu) 


Since log 0f = nlogo8!/” for all n, we see that 


log(@(u)) = lim nlog(3'/"(u)) 
= — lim n(1-— B” (u)) 


N— oo 


= — lim /(1- ev) nQn(dy) 
n> oo 
for all u € R. This last expression is the limit of a sequence of logarithms of 
compound Poisson characteristic functions, with nQn in the n*® term of the 
sequence playing the role that v plays in (16.1). O 


16.3. Lévy-Khinchin representations 


As we work our way towards a characterization of infinitely divisible distribu- 
tions, the function y illustrated in Figure 16.1 will play a special role. Some 
flexibility in the definition of this function is possible, but once a choice has been 
made, it is best to stay with that choice. A formula for the choice we have made 


is x(y) = (y A 1) v (-1). 


* Problem 13. Let 7 € R, o € R*, v be a finite measure on R \ {0}, and x be as 
shown in Figure 16.1. Define y: R —> C by 


2,2 . 
plu) = —inu + — +f (1 — e™” + iux(y)) v(dy). 
R\{0} 


Explain why the function exp o(—w) is the characteristic function of an infinitely 
divisible distribution and why every compound Poisson characteristic function has 
this form. 


296 16. INFINITELY DIVISIBLE DISTRIBUTIONS AS LIMITS 


FIGURE 16.1. The function x 


We will eventually generalize the preceding problem to encompass all infinitely 
divisible distributions. The first step in that direction is a definition. A measure 
v on R \ {0} is called a Lévy measure if 


(16.6) Í (y? A 1)v(dy) < œ. 
R\ {0} 


The next lemma establishes a useful connection, via Radon-Nikodym derivatives, 
between Lévy measures and general finite measures on R. 


Lemma 7. The relation 


dt vi ify =0 


=n d(o?5y +v)” 6(1- 4) ify 40, 


where do denotes the delta distribution at 0, establishes a one-to-one correspon- 
dence between the set of all finite measures C on R and the set of all pairs (a, v) 
where o is a nonnegative real number and v is a Lévy measure. 


PROOF. Let y be the function on the right of (16.7). Since y is nonnegative 
and finite everywhere, (16.7) defines a o-finite measure Ç whenever v is a o-finite 
measure and o > 0, by Proposition 17 of Chapter 8. Thus ¢ is determined by 
the pair (0, v). 

For the other direction, we use the Reciprocal Rule for densities (Problem 33 
of Chapter 8) to solve (16.7) for the density of 0769 + v with respect to Ç: 


alert 5Y) (5 B i if y = 0 

——(=4 ,.. 

dÇ AEN ify #0. 

Thus, the measure 0769 + v is determined by Ç. Since 69 and v are mutually 


singular, C determines g and v. Since the density is everywhere finite, v is a-finite 
if Ç is o-finite. 


16.3. LEVY-KHINCHIN REPRESENTATIONS 297 


It remains to show that if ¢ is finite, then the measure v determined by ¢ 
is a Lévy measure, and if v is a Lévy measure and ø > 0, then the measure Ç 
determined by (oc, v) is finite. Both of these statements follow immediately from 
the definition of Lévy measure and the fact that, by the l'Hospital Rule, the 
quotient 
y2 Al 
S a siny) 
6(1 — #2) 
is bounded away from 0 and from oo. O 


Lemma 8. Let 7 be a real constant, Ç a finite measure on R, and x the 
function in Figure 16.1. Define the function w by 


J u”/2 ify =0 
(16.8) w(u) = —inu + 1 — eY +iux(y). C(dy) . 
ee UU 
60 — T) 


Then exp 0o(—w) is an infinitely divisible characteristic function. 


PROOF. The result is obvious if ¢ is the zero measure. For other Ç, we define 
finite measures Ck, k = 1,2,..., on R by 


G.(B) = C(BN {y: y =0 or |y] > ¢}), 


noting that, for sufficiently large k, Çk is not the zero measure. Define yy, by 
(16.8) with C in place of ¢. A Dominated Convergence argument shows that 
w is continuous and Lemma 7 and Problem 13 imply that expo(—7,) is an 
infinitely divisible characteristic function for each k. We will finish the proof by 
showing that, for each u, w,(u) + y(u) as k > œ, for then it will follow by the 
Continuity Theorem and Proposition 2 that exp o(—w) is an infinitely divisible 
characteristic function. 

As k —> oo, the sequence (¢;,(R)) of numbers converges to the number ¢(R) 
by Continuity of Measure, and the sequence (aE) Ck) of probability measures 
converges to the probability measure ZR S by the Portmanteau Theorem. It 
follows that f gd¢, —> f gd¢ as k > oo for any bounded continuous g. For fixed 
u the integrand in (16.8) is a bounded continuous function. O 


The next lemma is called a lemma rather than a theorem only because it is 
incorporated in a later theorem. 


Lemma 9. Let 7 be a real number, o a nonnegative real number, v a Lévy 
measure, and x the function shown in Figure 16.1. Define w by 


o*u? 


(16.9) y(u) = —inut+ 5 + f (1 — e"! + iux(y)) v(dy). 
R\{0} 


Then exp 0(—w) is an infinitely divisible characteristic function. 


PROOF. Apply Lemma 7 and Lemma 8. O 


298 16. INFINITELY DIVISIBLE DISTRIBUTIONS AS LIMITS 


* Problem 14. Let X be a random variable having the characteristic function given 
in the preceding lemma. Prove that 


E(X) = 9+ J Baxoa) 
R\{O 


in the sense that if either side exists, then so does the other and they are equal. 
Show also that if E(| X|) < oo, then 


Var(X) =o? + | y? v(dy). 
R\ {0} 


Lemma 9 identifies a certain class of infinitely divisible distributions. The goal 
of the next two lemmas is a proof that such an infinitely divisible distribution 
can arise from only one triple (n,o, v). It will be seen near the end of this section 
that this class includes all infinitely divisible distributions. 


Lemma 10. Let w be related to a and Ç as in Lemma 8. Then 


(16.10) c(R) =3 f Wo) + w(-v)] du 
and, if ((R) # 0, 
uy ats = = = u u — v 
(16.11) fe a clay) = ae J, PEO — 2v(u) + lu — vl] dv. 
PROOF. From (16.9) we obtain 
y(u tv) — 2y(u) + Y(u — v) = 070? + 2f e*I (1 — cos vy) v(dy) . 
R\{0} 


Multiplication by 3 and integration on v from 0 to 1 using the Fubini Theorem 
gives 


3f [Y(u + v) — w(u) + y(u — v)| dv 
= 0’ eY 6(1 — $8) p(dy) = | e" C(dy), 
Eha (1 S82) v(dy) = | e" (ay) 


from which (16.10) and (16.11) follow. O 


Lemma 11. The mapping (n,o,v) ~ w defined by (16.9), where n € R, 
o ER, and v is a Lévy measure, is one-to-one. 


PROOF. Let Ç correspond to (a, v), as in Lemma 7, in which case that lemma 
and the Reciprocal Rule for densities implies that ~ is given by (16.8). Thus, if 
we can show that y determines ¢, it will follow from (16.8) that it determines 7 
and from Lemma 7 that it determines a and v, thus establishing one-to-oneness. 

That the real number ¢(R) is determined by w is an immediate consequence 
of (16.10). If C(R) 4 0, the probability measure z s is determined by its 


16.3. LEVY-KHINCHIN REPRESENTATIONS 299 


characteristic function, which by (16.11) is determined by y. Hence, Ç itself is 
determined by wy. O 


By Proposition 2 limits of sequences of infinitely divisible distributions are 
infinitely divisible. In view of the yet to be proven fact that every infinitely di- 
visible distribution corresponds to a triple (n, ø, v) via the relations in Lemma 9, 
it is natural to look for convergence conditions in terms of such triples. 


Theorem 12. Let ((n, On, Vn), n = 1,2,...) be a sequence of triples such 
that, for each n, Nn E R, on € R™, and vn is a Lévy measure. For each n, let Qn 
be the infinitely divisible distribution corresponding to (Tm, On, Vn) via Lemma 9. 
Then the sequence (Qn: n = 1,2,...) converges if and only if there exist n € R, 
o E€ R*, and a Lévy measure v for which the following three conditions all hold: 


v(B) = lim (B) for closed intervals B such that 0 ¢ B and v(OB) =0; 


noo 


g? = lim lim sup (o? + / y’ vn(dy)) 
[—e,€]\{0} 


ENO n= 


EN0 n> 


= lim liminf (o? + i y’ vn (dy)) ; 
[-<,¢]\{O} 


n= lim mn. 
n—> oo 
In case these conditions are satisfied the limit of the sequence (Qn: n > 1) is the 
infinitely divisible distribution corresponding to (n,o,v) via Lemma 9. 


Problem 15. Prove the preceding theorem. Hint: Keep in mind that the assertion 
that when the limit exists, it has the form described in Lemma 9, is part of the 
conclusion, not one of the hypotheses. 


Problem 16. Let X\ denote a standard Poisson random variable having mean A, 
and let Y be a normal random variable having mean 0 and variance 1. Use the 
preceding theorem to prove that 


as A > co. 


Here is the theorem, promised earlier, that characterizes infinitely divisible 
distributions. 


Theorem 13. [Lévy-Khinchin Representation] In the R-setting there is a 
one-to-one correspondence between the set of triples (n,o,v), where n E R, 
o €R', and v is a Lévy measure, and the set of infinitely divisible characteris- 
tic functions, with the characteristic function corresponding to the triple (n,o, v) 


300 16. INFINITELY DIVISIBLE DISTRIBUTIONS AS LIMITS 


being given by exp o(—w), where 


o?u? 


7 + hoal — e'™Y + iux(y)) v(dy) 


w(u) = —inu + 


and x is the function in Figure 16.1. 


PROOF. That every (7,0,v) leads to an infinitely divisible characteristic func- 
tion is the content of Lemma 9. That different triples give different characteristic 
functions is a consequence of Lemma 11, since Proposition 3 implies that an in- 
finitely divisible characteristic function has a unique continuous logarithm whose 
value at 0 is 0. To see that every infinitely divisible distribution has the given 
form, note first that, by Problem 13, every compound Poisson characteristic func- 
tion has the form described. Then, by Theorem 12, the limit of any sequence 
of compound Poisson characteristic functions has this form. By Proposition 6 
every infinitely divisible characteristic function is such a limit. O 


The function w is called the characteristic exponent of the corresponding 
infinitely divisible distribution. The formula for characteristic exponents of in- 
finitely divisible distributions simplifies nicely for those distributions that are 
also symmetric about 0. 


Proposition 14. An infinitely divisible distribution characterized by (n,0,v), 
via its Lévy-Khinchin representation, is symmetric about 0 if and only if n = 0 
and v is symmetric about 0, in which case its characteristic exponent ts 

2,2 
u 


uU ~ 


+ 2 | (1 — cos uy) v(dy). 
(0,00) 
* Problem 17. Prove the preceding proposition. 


16.4. Infinitely divisible distributions on R™ 


Nonnegative infinitely divisible distributions are particularly important and the 
relevant theory is somewhat simpler than the general theory for the R-setting. 
The following problem is a natural place to begin. 


Problem 18. [Compound Poisson Distributions on R‘] Prove directly, without us- 
ing a change of variables involving complex numbers, that if the measure v in (16.1) 
is supported by (0,00), then the corresponding compound Poisson distribution is 
supported by Rt and has moment generating function 


(16.12) v~ exp(- i a —e “¥) v(dy)) 


Problem 19. Let X have the distribution described in the preceding problem. Ver- 
ify that PLX = 0] = e7”(%™). 


16.4. INFINITELY DIVISIBLE DISTRIBUTIONS ON Rt 301 


* Problem 20. Prove that every compound Poisson distribution supported by [0, co) 
has the form (16.12); that is, show that the Lévy measure of a compound Poisson 
distribution supported by [0, co) is supported by (0, 00). 


The point of the next result, in comparison with Proposition 6, is that here 
the compound Poisson distributions may be chosen to have support RT. 


Proposition 15. Every infinitely divisible distribution supported by R™ is the 
limit of some sequence of compound Poisson distributions supported by Rt . 


Problem 21. Prove the preceding proposition. Hint: See the proof of Proposition 6. 


A measure v on (0,00) is a Lévy measure for Rt if 


(16.13) J, iy A 1)v(dy) < œ. 


By analogy with this terminology, the phrase ‘for R’ may, for clarity, be adjoined 
to the term ‘Lévy measure’ when the discussion concerns infinitely divisible dis- 
tributions on R. 

Corresponding to a Lévy measure v for R* and a constant € € Rt, not both 
zero, it is convenient to introduce a finite measure u on R™ defined by 
duo ‘ if y =0 


(16.14) dot) \1—e-¥ ify € (0,00). 


Problem 22. Mimic the development from Lemma 7 through Lemma 11 to show 
that, for each pair (£, v), where € € Rt and v is a Lévy measure for R? , there exists 
a unique infinitely divisible distribution on R*, the moment generating function of 
which equals exp 0o(—@), where 


O(v) = £v + I (l—e °*)v(dy). 
(0,00) 


The constant € is sometimes called the shift of the corresponding infinitely divisible 
distribution. 


Problem 23. How much of the preceding problem could have been done easily 
using Theorem 27 of Chapter 14? 


Problem 24. Let X be a finite infinitely divisible random variable corresponding 
to (€,v) as in Problem 22. Show that 


E(X) = €+ / pula. 


(0,00) 


302 16. INFINITELY DIVISIBLE DISTRIBUTIONS AS LIMITS 


* Problem 25. By Problem 58 of Chapter 14 (or Example 4 of this chapter), gamma 
distributions are infinitely divisible. Find the shifts and Lévy measures of all 
gamma distributions with support equal to RT. 


Here are the analogues for the R* -setting of Theorem 12 and Theorem 13. 


Theorem 16. Let ((En, Vn); n = 1,2,...) be a sequence of pairs such that, 
for each n, En € Rt and vn is a Lévy measure for R*. For each n, let Qn be the 
infinitely divisible distribution on R* corresponding to (En, Vn) via Problem 22. 
Then the sequence (Qn: n = 1,2,...) converges to a distribution on R® if and 
only if there exist € € Rt and a Lévy measure v for R* for which the following 
two conditions both hold: 


y|x,00) = lim vn|z,œ) if0< zx andv{x}=0; 


= lim lim su ( at 
é ENO a é 


(0,¢] 


y vn (dy)) 


= lim lim inf (En + / y va(dy)) i 


ENO noo (0, ] 
In case these conditions are satisfied, the limit of the sequence (Qn: n > 1) is 
the infinitely divisible distribution corresponding to (€,v) via Problem 22. 


Theorem 17. [Lévy-Khinchin Representation for RY] In the Rt -setting there 
is a one-to-one correspondence between the set of pairs (£, v), where £ € R* and 
v is a Lévy measure for R*+ , and the set of infinitely divisible moment generating 
functions for R* , with the moment generating function corresponding to the pair 
(€,v) being given by expo(—@), where 


O(v) = Ev + / (1 — e` ™”)v(dy). 


(0,00) 


Problem 26. Prove the preceding two theorems. 


Problem 27. Prove that every infinitely divisible moment generating function for 
R+ has the form described in Theorem 27 of Chapter 14. 


Problem 28. Discuss why it is that certain Lévy measures v for R for which 
v(—oo,0) = 0 have not been designated as Lévy measures for Rt. 


16.5. Extension to R 


In the R-setting, the sequence of n*® convolution roots of an infinitely divisible 
distribution converges to the delta distribution at 0 (see Corollary 5). In the 
R’ -setting we incorporate this property into the definition. 


16.6. THE TRIANGULAR ARRAY PROBLEM: INTRODUCTION 303 


Definition 18. A distribution Q on R” is infinitely divisible if there exists a 
sequence (Qn: n = 1,2,...) of distributions on R” such that Q = Q*” for each 
n and limpo Qn = 60, where ĝo denotes the delta distribution at 0. 


Problem 29. Prove that Qn in the preceding definition is necessarily unique. Hint: 
Use moment generating functions. 


Problem 30. Prove that there is only one distribution Q on R', namely the delta 
distribution at co, that satisfies the first condition in Definition 18 (that Q3” = Q 
for some distributions Qn on R’) but not the second condition (that limn+oo Qn = 


So). 


Problem 31. Show that a generalized Poisson distribution with support equal to 
{0, co} is infinitely divisible by finding an explicit formula for its convolution roots. 


; 3 at . 
A measure v on (0, co] is a Lévy measure for R if 


(16.15) I. iy A 1)v(dy) < œ. 


Problem 32. Carry out for the R’ -setting the analogue of the program requested 
in Problem 22 for the Rt-setting. 


It is possible for a sequence of infinitely divisible distributions on R` to con- 
verge to the delta distribution at oo, which according to Definition 18 is not 
infinitely divisible. This fact makes the wording of an analogue of Theorem 16 
slightly more complicated than that of Theorem 16 itself. 


* Problem 33. State and prove an analogue of Theorem 16 for the R’ -setting. 


Problem 34. State and prove an analogue of Theorem 17 for the R` -setting. 


16.6. The triangular array problem: introduction 


The remainder of this chapter is concerned with the limiting behavior of se- 
quences of distributions of sums of independent random variables. 

Example 1 describes a situation in which a compound Poisson distribution 
arises naturally as a limit connected with sums of independent random variables. 
The result in that example differs from the Weak Law of Large Numbers or the 
Classical Central Limit Theorem, because it is not centering or scaling that 
enables one to obtain a limit, but rather a more complicated changing of the 
probability structure. A general framework designed to accommodate both types 
of settings is the focus of this section. 


304 16. INFINITELY DIVISIBLE DISTRIBUTIONS AS LIMITS 


Let Xkn, 1< k <n, 1 <n < œ, be Rvalued random variables. Assume, 
for each n, that (X4.n: k = 1,...,n) is an independent n-tuple and let S, = 
> -p-1 Xk,n- Our goal is to give a criterion for the sequence (Sp: n = 1,2,...) 
to converge in distribution, and, in the case of convergence, to identify the limit. 
It is common to depict (Xk,n: 1<k<n,n=1,2,...) as follows: 


X14 
X12 X22 


7 


X13 X23 X33 


For this reason, it is called a triangular array, the independence described above 
is called row-wise independence, and the sums Sn are called row sums. Since 
the issue is that of convergence in distribution of the sequence of row sums, 
the joint distributions of random variables in different rows are irrelevant. In 
particular, the random variables in different rows need not be defined on the 
same probability space. 

Sometimes one wants to consider a generalization in which the requirement 
that there be exactly n entries in the nt? row of the array be dropped. One 
can easily turn such an array of random variables into a triangular array by first 
duplicating some rows to obtain an array in which none of the rows are too long, 
and then by inserting 0’s into rows that are too short. 

The triangular array problem is this: given a row-wise independent triangular 
array, characterize the limiting behavior of the sequence of distributions of the 
row sums in terms of data about the distributions of the row summands. In 
the succeeding sections of this chapter, we give a fairly detailed solution to this 
problem. In the remainder of this section, we introduce some simple concepts 
which will play an important role in this solution. 

It turns out that the best way to approach the triangular array problem is 
via characteristic functions (or moment generating functions in the Rt -setting). 
The characteristic function of the row sum is the product of the characteristic 
functions of the row summands, and as we have already seen, it is useful to 
take logarithms when dealing with such products. Of course, the logarithm of 
a characteristic function must be carefully defined because of the singularity of 
the logarithm at 0. This difficulty does not occur when dealing with infinitely 
divisible distributions, as we have already seen when we defined characteristic 
exponents. In the following paragraphs, we generalize the definitions of charac- 
teristic exponent. 

Let Q be a distribution on R with characteristic function 6, and let Jg be 
the largest open interval containing 0 on which @ is never 0. The characteristic 
exponent of Q (and of 8) is the function 


u~»—logoB(u), ue Je, 


16.6. THE TRIANGULAR ARRAY PROBLEM: INTRODUCTION 305 


where log o8 denotes the unique continuous logarithm of 8 on Ja whose value at 
0 is 0 (see Problem 7 of Appendix E). The following result is an easy consequence 
of the Continuity Theorem of Chapter 14 and Problem 8 of Appendix E: 


Proposition 19. Let (Qn: n=1,2,...) be a sequence of distributions on R, 
and let (Yn: n = 1,2,...) be the corresponding sequence of characteristic expo- 
nents. If Qn > Q as n > œ, where Q is a distribution on R with characteristic 
exponent yw, then Wn(u) > y(u) for all u in the domain of Y. Moreover, the 
convergence is uniform on compact subsets of the domain of w. 


We have one other issue to discuss now. It turns out that this issue is not 
relevant for triangular arrays in which the summands within each given row are 
identically distributed, so the reader who is only interested in that special case 
may skip ahead to the next section. 

For the general triangular array problem, we will not be concerned with cases 
in which a relatively small number of summands in a row significantly affect 
the distribution of the row sum, so we wish to introduce a condition designed 
to eliminate such examples. We have already encountered a special case of this 
condition in Example 1, where it was assumed that max{pz.n: 1 < k < n} 0. 
For an arbitrary triangular array (Xk n: k < n = 1,2,...) the condition is 


lim sup P||Xk n| >e]=0 for alle >0. 
n—> OO k 


A triangular array satisfying this assumption is uniformly asymptotically neg- 
ligible, abbreviated uan. Typically, we use adjectives introduced for random 
variables also for entities that are associated with the random variables; for 
example, we speak of a ‘uan triangular array of characteristic functions’. 

If the uan condition is not assumed in a general triangular array, any limiting 
distribution R is possible, since one summand in each row could have the distri- 
bution R, and the other summands could all equal 0. When the uan condition 
is in force, a fixed finite number of summands in each row cannot influence the 
limiting distribution of the row sums. We will find that, under the uan condition, 
only infinitely divisible distributions are possible as limits. 

As mentioned earlier, characteristic exponents will play an important role in 
our solution of the triangular array problem. The following lemma shows that 
the uan condition is just right for ensuring that the characteristic exponents of 
the row summands and row sums are defined on arbitrarily large domains as 
n— oo. 


Lemma 20. Let (Bkn: k <n =1,2,...) be a triangular array of character- 
istic functions. Then the following three statements are equivalent: 


(i) the triangular array is uan; 

(ii) limno (maxı<k<n I1 — Br,n(u)|) =0 for eachu; 

(iii) limno (maxi<k<n max{|1 — k,n (u)|: lu] < w}) =0 
for each w > 0; 


306 16. INFINITELY DIVISIBLE DISTRIBUTIONS AS LIMITS 


(iv) limno (maxı<k<n max{| log obk, nlu)|: |u| < w}) = 0 
for each w >Q. 


Problem 35. Prove the preceding lemma. Hint: Use Lemma 11 of Chapter 14. 


In addition to helping us deal with the domains of characteristic exponents, 
Lemma 20 is also quite useful for approximating these functions by using Taylor 
series. Here is analogue of Lemma 20 for moment generating functions. 


Lemma 21. Let (Pkn: k < n = 1,2,...) be a triangular array of moment 
generating functions for either the R* - or R” -setting. Then the following three 
statements are equivalent: 


(i) the triangular array is uan; 

(ii) limp—oo(maxi<k<n(1 — Pr nlv))) =0 for eachv; 

(iii) limn (maxı<k<n max{(1 — Yk n(v)): v < w}) =0 
for each w >Q; 

(iv) limno (Maxi<k<n max{— log oyz nlu): jul < wt) == () 
for each w > 0. 


Problem 36. Prove the preceding lemma. 


In the next section, we will solve the triangular array problem in the special 
case that the summands within each row are identically distributed. In the 
section following that, two more special cases are considered: summands that 
are symmetrically distributed about 0 and nonnegative summands. In the final 
section of the chapter, the general triangular array problem is solved. 


16.7. lid triangular arrays 


Let (Qn: n = 1,2,...) be a sequence of distributions on R, and consider a 
row-wise independent triangular array in which each summand in the n‘* row 
has distribution Qn. Such a triangular array is an iid triangular array. The 
distribution of the nt? row sum is Q*”, and the triangular array problem in the 
iid case is to determine the possible limits of the sequence (Q*": n = 1,2,...). 
The following theorem solves this problem. 


Theorem 22. [iid case] Let (Qn: n > 1) be a sequence of distributions on R. 
In order that the sequence (Q*": n > 1) converge it is necessary and sufficient 
that there exist a triple (n,o,v), where n € R, o € R, and v is a Lévy measure, 


16.7. ID TRIANGULAR ARRAYS 307 


satisfying the following three conditions: 
v(B) = lim (nQ,)(B) for closed intervals B 
(16.16) is 


such that 0 ¢ B and v(OB) = 0; 


q 
Il 


? = lim lim sup | x? (nQn)(dz) 
[exe] 


ENO noo 


(16.17) 
— lim im; 2 l 
= lim lim inf tog L^ (nQn) (dz) ; 
(16.18) n= lim f x(2) (nQn)(dz). 


If these conditions are satisfied, the limit of the sequence (Q7*": n > 1) is the 
infinitely divisible distribution corresponding to (n,o,v) via the Lévy-Khinchin 
Representation Theorem, and (Qn: n > 1) converges to the delta distribution at 
0. 


PROOF. Let Yn» denote the characteristic exponent of Qn. Suppose that the 
sequence (Q7": n > 1) converges to some limit Q with characteristic exponent 
p. By Proposition 19, y(u) = limnw,(u) for each u in the domain of w. Since 
n — oo, it must be that Yn(u) — 0 for such u and hence that (Qn: n > 1) 
converges to the delta distribution at 0 (Lemma 16 of Chapter 14). In terms of 
Qn we have 


y(u) 


lim -n1og( | grin Qn(dz)) 
äm -nlog(1 — fa a) Qn(dz)) 


The logarithm must be defined for all sufficiently large n (possibly depending on 
u) and approach 0 as n > oo. Therefore, the limit is the same as that obtained 
by replacing each logarithm by the first term of its Taylor series: 


(16.19) w(u) = lim | (1 —e'*) (nQ,)(dz). 


Let J be the domain of Yy. We wish to show that J = R. We will do so by 
showing that J cannot have any finite endpoints. By (16.19), 


R(i(u)) = lim (1 — cos(ux))(nQn)(de) . 


Since 1 — cos 20 < 4(1 — cos 8) for all 0 € R (see Problem 37 below), it follows 
that R(y(2u)) < 4R (y(u)) for all u € R. Since w is continuous on the interior of 
J, this last inequality implies that w% cannot become unbounded near any finite 
endpoint of J. But the definition of J implies that y must become unbounded 
near any finite endpoint of J. Therefore, J cannot have any finite endpoints. 


308 16. INFINITELY DIVISIBLE DISTRIBUTIONS AS LIMITS 


We now rewrite (16.19) using the function x. Let nn = f x(x) (nQn)(dz). 
Then 


(16.20) y(u) = Jim (inu +f (1 — e"? + iux(z)) (nQn)(dz)) 


R\{0} 
for all u € R. Let vn be the restriction of n@,, to R\ {0}. Then (16.20) expresses 
w as the limit of a sequence of characteristic exponents of infinitely divisible 
distributions, the nt” one of which is characterized by the triple (mn, 0, Vn). An 
application of Theorem 12 shows that Q is infinitely divisible and characterized 
by the triple (7,¢,v), where v, o, and 7 are defined by (16.16), (16.17), and 
(16.18). 

For the converse suppose that (16.16), (16.17), and (16.18) hold for some 
n E R, o € R*, and Lévy measure v, and let y denote the characteristic exponent 
corresponding to (ņn,o,v) via the Lévy-Khinchin Representation Theorem. As 
above, let vn be the restriction to R \ {0} of nQ,. By Theorem 12, y is the 
pointwise limit of the sequence of characteristic exponents corresponding to the 
triples (nn, 0, Vn), where nn = f x(x) vn(dz). That is, 


+00 


y(u) = lim (-innu + haal — ett? 4 iux(z)) Yn (dz) 
(16.21) = lim n(1 — aa Qn(dz)) 


moreover the convergence is uniform for u in any bounded interval, as it always 
is for convergence of sequences of characteristic functions to a characteristic 
function. In any such interval, the quantity within parentheses in (16.21) must 
be uniformly close to 0 and thus, for n sufficiently large, the last integral must 
have positive real part. For such n, the logarithm of the integral must have 
imaginary part between —5 and } and thus be close to 0 when the quantity 
within parentheses is close to 0. Accordingly, 


plu) = lim -nlog( f ew Qn(dz)) = lim nyp(u). 
n> CO MOO 
Therefore, Q7" — Q, as desired. 

(Comment: The reason for the fuss in the latter portion of this proof is the 
fact that it is not generally true that characteristic exponents are close to 0 when 
characteristic functions are close to 1. That a characteristic function is close to 
1 at some point only forces the corresponding characteristic exponent to be close 
to an integral multiple of 277.) O 


Problem 37. Prove the inequality 1 — cos20 < 4(1 —cos@), 6 € R. Hint: For 
6 € [0, x], compare derivatives. 


16.7. IID TRIANGULAR ARRAYS 309 


Corollary 23. [iid case] In order that the sequence (Q7": n > 1) converge it 
is necessary and sufficient that the following three conditions all hold: 


(16.22) lim limsupnQ,({z: |z| > y}) =0; 
y> n= 
(16.23) n= lim n | x(a) Q, (dz) 
n> 00 


for some n € R; 


ao’ õolz, y] +j t? v(dt) = lim nf t? Q,(dt) 
[x,y] 


[x,y]\ {0} ERSS 


if x,y #0 and v{z} = v{y} =0 


for some o € R and Lévy measure v, where ôo denotes the delta distribution at 
0. If these conditions are satisfied, the limit of the sequence (Q7": n > 1) is the 
infinitely divisible distribution corresponding to (n,0,V). 


(16.24) 


Problem 38. Prove the preceding corollary. 
Problem 39. Use the preceding theorem or corollary to redo Example 1. 


Problem 40. Study the limiting behavior of the sequence of row sums of the row- 


wise independent triangular array of random variables X;,,, having distributions 


Qk,n given by Qk n{1} = + and Qkn{+} = ==. Approach the problem by using 


Theorem 22 and then again by using moment generating functions. 


* Problem 41. Define Qn by Qn{+} = Qn{-=} = =, Qn{c} = Qn{-c} = #, 
for some constant c > 1. Use Theorem 22 to decide if the sequence (Q;7”: n > 1) 
converges, and if so, identify the limit. 


* Problem 42. Define Qn by Qn{1} = Qn{-1} = 4, Qn{n7/?} = Qn{-n 7} = 


n=. Decide if the sequence (Q,": n > 1) converges, and if so, find the Lévy- 


Khinchin representation of the limit. 
Problem 43. Define Qn by Qn{0} = 4 and 
1 1 
hia) = 2 
Decide if lim Q,,” exists, and if so, find its Lévy-Khinchin representation. 
Problem 44. Define Qn by Qn{0} = $ and 
1 1 
Qnt (2+ pnt mera 


Decide if lim Q%” exists, and if so, find its Lévy-Khinchin representation. 


310 16. INFINITELY DIVISIBLE DISTRIBUTIONS AS LIMITS 


Problem 45. Define Qn by Qn{0} = $ and 
1 1 1 
teria =e ere ma 


Decide if lim Q%” exists, and if so, find its Lévy-Khinchin representation. 


Problem 46. Let Qn have density with respect to Lebesgue measure A given by 


fer (x) = (a —|z|) V0) + ae : 


Decide if lim Qž” exists, and if so, find it characteristic exponent. 

Problem 47. Apply Theorem 22 to the sequence (Qn) defined by 
Qn{27 7" + (-8) "} = L, 1<m<n. 

If lim Q%” exists, find its characteristic exponent. 


Problem 48. Let (Qn: n > 1) be a sequence of distributions on R such that 
limnso Q” exists. Denote the Lévy measure of the limit by v. Let c be a 
positive constant for which v{—c,c} = 0. Let Y, denote the number of random 
th row of a corresponding triangular array which have absolute 
value larger than c. Prove that (Yn: n > 1) converges in distribution to a standard 
Poisson random variable with mean v{y: |y| > c}. 


variables in the n 


Problem 49. Investigate the question: Can the hypothesis v{—c, c} = 0 be dropped 
from the preceding problem? 


16.8. Symmetric and nonnegative triangular arrays 


In this section we drop the assumption that all the distributions in each row 
of a triangular array are the same, but we specialize in other ways that make 
the triangular array problem quite simple to treat. Two cases are considered: 
all distributions symmetric about 0, and all distributions supported by R*+. In 
each case the relevant theorem can be proved along the lines of the proof of 
Theorem 22. First, the special nature of each case and Lemma 20 or Lemma 21 
justify the use of first-term Taylor approximations, as in the argument leading 
to (16.19). Then it can be seen (as in (16.20)) that such approximations are 
actually the negative logarithms of infinitely divisible characteristic functions or 
moment generating functions, to which either Theorem 12 or Theorem 16 can 
be applied. 

It may be helpful to again read Proposition 14 about infinitely divisible dis- 
tributions that are symmetric about 0 before examining the next theorem. 


Theorem 24. [symmetric case] Let (Qin: 1<k<n,n=1,2,...) be a uan 
triangular array of distributions on R, each of which ts symmetric about 0. For 
each n, let 


Qn = Qin * Qn * e Qnn. 


16.8. SYMMETRIC AND NONNEGATIVE TRIANGULAR ARRAYS 311 


In order that the sequence (Qn: n = 1,2,...) converge it is necessary and suffi- 
cient that there exist a nonnegative number o and a Lévy measure v, symmetric 
about 0, satisfying the following two conditions: 


(16.25) ulmoj = Jim ` Qk nlr, œ) if0< zx andv{z}=0; 
k=1 


ENO n=œ 


= lim lim wD T? Qk n(dz) 
[—e,e] 
(16.26) 


= . . . 2 
ie T° Qk nldz). 


In case these conditions are satisfied, the sequence (Qn: n = 1,2,...) con- 
verges to the infinitely divisible distribution corresponding to (0,0,v) via the 
Lévy-Khinchin Representation Theorem. 


* Problem 50. Prove the preceding theorem. 


Theorem 25. [nonnegative case] Let (Qkn: 1<k <n,n=1,2,...) bea 
uan triangular array of distributions on Rt. For each n, let 


Qn = Gin * Qon * e Qnn- 


In order that the sequence (Qn: n = 1,2,...) converge to a distribution on Rt 
it is necessary and sufficient that there exist a nonnegative number € and a Lévy 
measure v for R+ satisfying the following two conditions: 


(16.27) vir, o0) = p Qk nliz, œ) if0<a2 andv{zx} =0; 


t= lim mimsy / L Qk n(dz) 
0,¢] 


NOOO 


(16.28) 
= mimant > f T Qk nldz). 


In case these conditions are satisfied, the sequence (Qn: n = 1,2,...) converges 
to the infinitely divisible distribution on R® corresponding to (£, v) via the Lévy 
Khinchin Representation Theorem. 


Problem 51. Prove the preceding theorem. 


Problem 52. Redo Problem 40 using Theorem 25. 


312 


16. INFINITELY DIVISIBLE DISTRIBUTIONS AS LIMITS 


Problem 53. Redo Problem 42 using Theorem 24. 


Problem 54. Let Qk,n be the standard exponential distribution with mean reese 
Decide if 


lim (Qin * Qayn * =: * Qnin) 


exists as a distribution on R*, and if so, find its moment generating function. 
Problem 55. Redo Problem 43 and Problem 44 using Theorem 25. 


Problem 56. Define Qk,n by Qin {0} = 5 and 


1 1 
ml eet = 3: 


Decide if lim(Qijn * Q2,n *--- * Qn.n) exists, and if so, evaluate it. 
Problem 57. For each a > 0, check the uan condition and study the limiting 
behavior of the sequence of row sums of the row-wise independent triangular array 


of two-sided exponential random variables with mean 0 and the variance of the k*t” 
random variable in the n'" row equaling (k + n)~°. 


Problem 58. For 1 < k <n’, define Qin by 


k12 
Qk,n{0} a 1 _ In? 
dQk,n T 1 N 


where A denotes Lebesgue measure. Study the limiting behavior of the sequence 
(Qin *****Qn2,.n: n > 1) asn— oo. In particular, if the limit Q exists, calculate 


Q{0}. 
Problem 59. For 1 < k <n’, define Qk.n by 


k?/2 
Qin {0} = 1- a 
Qk ny n |x| 1 
“ay = ome Foal eee 
where à denotes Lebesgue measure. Study the limiting behavior of the sequence 
(Qin *°*-* Qn2.,: n > 1) as n — oo. In particular, if the limit Q exists, calculate 


Q{0}. 


Problem 60. Adapt Theorem 25 to the R -setting in order to give necessary and 
sufficient conditions for convergence to a limit different from the delta distribution 
at oo, including in your statement an identification of the limit when it exists. 


Problem 61. Define Qn by Qn{=} = 2 and Qnr{n} = L, For the R’ -setting 


n 
decide if there is a distribution Q different from the delta distribution at oo such 


that 
Q= lim Q,”. 


n= oo 


If so find Q explicitly. 


16.9. GENERAL TRIANGULAR ARRAYS 313 


16.9. {| General triangular arrays 


Theorem 22, Theorem 24, and Theorem 25 are all special cases of the main 
result of this section, Theorem 30, although in the cases of Theorem 22 and 
Theorem 25, some work is needed to extract them as corollaries of the general 
theorem. This section also contains another special case of the triangular array 
problem, which we call the Centered Case Lemma. It is called a lemma because 
its chief purpose is to help us prove the general theorem. The following two 
problems concern some elementary inequalities which will be used in the proof 
of the Centered Case Lemma. 


Problem 62. Prove that if w € [0,1], there exists a constant c depending on w 
such that 


f [ux(z) — sin uzl du < ef (1 — cos ux) du 
0 0 


for all z € R. Hint: For |x| < 1, the integrals can be explicitly calculated, and then 
bounds from Appendix E can be used. For the remaining values of x, note that 
for fixed w, the integral on the left is bounded above by a finite constant, and the 
integral on the right is bounded below by a positive constant. 


Problem 63. Prove that for all w € [0,1] and all z € R, 
w? 2 g 
— (x^ ^1) < (1 — cos ur) du. 
12 : 


Hint: For x € (0, an use the inequality 


6 120 =Y 2 


which may be obtained from an inequality in Appendix E. 


Lemma 26. [centered case] Let (Ryn: 1 < k < n,n = 1,2,...) be a uan 
triangular array of distributions on R such that 


f xR =0 forallk andn. 
R 


For each n, let 
Rn = Rin * Ran * e Ran- 


In order that the sequence (Rn: n = 1,2,...) converge it is necessary and suffi- 
cient that there exist a nonnegative number o and a Lévy measure v satisfying 
the following two conditions: 


v(B) = lim Ry, p(B or closed intervals B 
(16.29) ( ) ae hk ) f 


such that O Z B and v(OB) =0; 


314 16. INFINITELY DIVISIBLE DISTRIBUTIONS AS LIMITS 


(16.30) aus 
2 
= Heytimint fe? Real) 
In case these conditions are satisfied, the sequence (Rp: n = 1,2,...) con- 


verges to the infinitely divisible distribution corresponding to (0,0,v) via the 
Lévy-Khinchin Representation Theorem. 


PROOF. Let Bk, n denote the characteristic function of Ry». The plan is to 
begin (Steps la and 1b of the proof) by showing that the assumption that (Rn) 
converges and the assumption that both (16.29) and (16.30) hold each lead to 
the conclusion that 


(16.31) sup{ > [@ A 1) Rk n(dr): n = ee <, 
k=1 


then (Step 2) use this fact in conjunction with the uan condition to show the 
logarithm can be replaced by the first term of its expansion, and finally (Step 3) 
use Theorem 12 as in the proof of Theorem 22. 

Step la. Suppose that (Rn: n = 1,2,...) converges to some limit R. Fix 
w € (0,1) in the domain of the characteristic exponent of R. By Proposition 19, 
the characteristic exponent of R, converges to the characteristic exponent of Q 
uniformly on [0, w] as n + oo. In terms of the functions k,n, we have 


lim sup sup NC R(- log ok nlu)) < œ. 
n> O<u<w k=1 
That is, 
limsup sup {> — log(1 — [1 — R? (Ben(u)) — (Ge m(u))]) | < OO. 
n> O<u<w k=l 
Using — log(1 — t) > t, t € [0,1), we see that 
n 
imsup{ sup Y [1 = A(n tu) = P (Aratu) } < 00. 
NCO O<u<w k=l 


By Lemma 20, R(3k n(u)) 4 1 uniformly for u € [0,w] and k = 1,...,n as 
n — 00, SO 


lim supi sup XO [1 - R(br,n(u)) - 2°(Bs.n(u)))} < 00. 


noo (O<susw,_) 


Integration on u gives 


(16.32) lim suf A [1 — R(Lrnlu)) — I (Br, nlu))] du) < oo. 


n> oo 


16.9. GENERAL TRIANGULAR ARRAYS 315 


Next we use f xdRk, n = 0 to obtain 


| "52 (pau) du 
(16.33) a 
< max, |3(Bk.n(u)) i; J |sin ux — ux(2)| Ren (dz) du 


By Problem 62 and two applications of the Fubini Theorem, there exists a con- 
stant c depending only on w such that for x € R and 1 < k < n = 1,2,..., 


r J |sinux — ux(x)| Rk, n (dz) du 
(16.34) < ff (1 — cos ur) du Rg.» (dx) 
E a [1 = R( Bk n(u))| du. 


By Lemma 20, 3(G.n(u)) — 0 uniformly for u € [0,w] and k = 1,...,n as 
n > œ, so it follows from (16.33) and (16.34) that removing the term 3? (Bk n(u)) 
from (16.32) does not affect the finiteness of the quantity given there. That is, 


imsup{ >) f [1 — R(bk,n(u))] du) L OO 


The inequality in (16.31) now follows from Problem 63 and the Fubini Theorem. 
Step 1b. Suppose now that (16.29) and (16.30) hold. From (16.30) it follows 
that there exists € > 0 for which 


nu CO 


lim sup ae r? Rk n(dx) < co. 
k=1 ” [—8-E] 
From (16.29) we obtain 
lim sup X Rin({x: r| >eE} < œ. 


Combining these two facts, we obtain (16.31). 
Step 2. Using fe XdRk,n = 0 we have (for sufficiently large n depending on 
u) 


S — (log or,n) (u) 
k=1 
= = log(1 — fia — cosux) + i(—sinur + ux(x))] Rk,n(dz)) 


k=1 


316 16. INFINITELY DIVISIBLE DISTRIBUTIONS AS LIMITS 


We want to study the limiting behavior as n — oo by replacing the logarithm 
by the first term of its series. Doing this will be valid provided that we show 


me È|(fe -maana 


(16.35) 
+ (f sin ux + ux(z)) Rk,n(dz)) | = 0 


whenever (16.31) holds. Since the functions z ~ (1—cosuz) and z ~~ (— sin uz + 
ux(x)) are bounded and are continuous at 0 with the value 0 there, the uan 
condition implies 


lim max fa — cosuz) Ry (dx) + fi sin ux + ux(x)) Re n(dz)| = 


n> 1<k<n 


Therefore, to prove (16.35) we only need prove 


n 


lim >D [fa — cos uz) Rk n(dx) + | fi- sin ug + ux(z)) Re n(de)| < OOo. 


N+ OO 
k= 


This is a consequence of Step 1 because 1 — cosuz is bounded by a multiple 
(depending on u) of z? A 1 and | — sin ug + ux(x)| is bounded by a multiple of 
|z|? A 1 and thus by a multiple of z? A 1. 

Step 3. By Step 2, $}; —(log 0f%,n(u)) has the same limiting behavior as 


fia — cosuz) + i(— sinus + ux(z))] (5 Ris) (dz) 
k=1 


= ft — e'? + juy(z)] (5 Rr,n ) (dz). 
k=1 


This is the characteristic exponent of the infinitely divisible distribution charac- 
terized, via its Lévy-Khinchin representation, by the triple (0,0, >), Ren). The 
desired result now follows from Theorem 12. O 


For the proof of the forthcoming general theorem we will not use the first 
term of the series for the logarithm. Rather we will use centering to transform 
the distributions to ones to which the preceding lemma can be applied. The 
following lemmas will be of use in connection with this centering. 


Lemma 27. For any distribution Q on R, the equation 
[ xe-9) Qaz) = 0 


has at least one solution. 


Problem 64. Prove the preceding lemma. 


16.9. GENERAL TRIANGULAR ARRAYS 317 


Lemma 28. Let (Qkn:1 < k <n =1,2,...) be a uan triangular array of 
distributions on R and suppose that numbers qk, n satisfy 


[xe — dkn) Qk n(dz) =0. 


Then 


= u 


Problem 65. Prove the preceding lemma 


The following lemma states that if we use the constants k,n to center the 
distributions Qk, n, then we obtain a uan triangular array (Ax) which has the 
property that it satisfies conditions (16.29) and (16.30) of the Centered Case 
Lemma if and only if (Qk n) satisfies the analogous conditions (16.36) and (16.37) 
of the general result. The reader should look ahead to Theorem 30 to see precisely 
what those analogous conditions state before studying the following lemma. 


Lemma 29. Let Qk n and qk n be as in the preceding lemma and define Rk,n 
by Rk n(B) = Qk,n(B + qk, n) for all Borel sets B. Then: 


(i) the triangular array (Rk n) is uan; 

(ii) for any Lévy measure v, the statement (16.29) and its 
analogue (16.36) are equivalent; 

(iii) if (16.36) is true for some Lévy measure v, then 


lim > [ dk,n - fx 2) Qk,n(dx)] = 
k=1 


(iv) if (16.36) is true for some Lévy measure v, then the 
statement (16.30) and its analogue (16.37) are equivalent. 


Problem 66. Prove the preceding lemma. Hint: For the last statement in the 
lemma prove that if (16.36) holds, then (16.30) and (16.37) are each equivalent to 
the following statement 


= lim lim sup 3 (e T’ Qk n(dr) — gk A 


ENO n> 


= lim liminf ~[/ T? Qk n(dz) — Èn] i 
D diea] 


Now we are in position to solve the general triangular array problem. 


318 16. INFINITELY DIVISIBLE DISTRIBUTIONS AS LIMITS 


Theorem 30. [general case] Let (Qkn:1 <k < n,n = 1,2,...) be a uan 
triangular array of distributions on R. For each n, let 


Qn = Qin * Qon * k Qnn- 


In order that the sequence (Qn: n = 1,2,...) converge it is necessary and suf- 
ficient that there exist a real number n, a nonnegative number o, and a Lévy 
measure v satisfying the following three conditions: 


v(B) = lim (B) for closed intervals B 
(16.36) a ara A E 


S that O ¢ B and v(OB) = 0; 
(16.37) 


_ lim iii ww (f, a Qkn(dx) — if £Qs.n(dz)] | 


NCO 


n 2 
= lim lim inf f r? n(dz) — | T n(dz ) ; 
ENO n—->0o l (—e,€] Qr, ( ) | [—e,€] Qe, ( | 


(16.38) n= lim 2 | * fx (£) Qk n (dz). 
In case these conditions are satisfied, the sequence (Qn: n = 1,2,...) con- 


verges to the infinitely divisible distribution corresponding to (n,o,v) via the 
Lévy-Khinchin Representation Theorem. 


PROOF. We first introduce some notation. By Lemma 27, there exist numbers 
qk,n Such that 


f X(T — qk n) Qk.n(dz) = 0 
R 


for each k and n. Define distributions Ryn by Rk, n(B) = Qk n(B + Gen). It is 
clear that fe xdRk,n = 0 for each k and n, and by Lemma 29, (Rk,n) is a uan 
triangular array so that Lemma 26 applies to it. For each n, let 


Rn = Rin * Ron že Rann. 
Note that Rn(B) = Qn(B+ mn) for all Borel sets B, where 


n 
Nn = ` k,n- 
k=1 


We now prove the sufficiency of (16.36), (16.37), and (16.38). Suppose that 
(16.36) and (16.37) hold. By Lemma 29, (16.29) and (16.30) hold. By Lemma 26, 
the sequence (Ry: n = 1,2,...) converges to the infinitely divisible distribution 
characterized by the triple (0,0,v). Suppose that (16.38) also holds. Then 
(iii) in Lemma 29 implies that m + 7 as n — oo, so the sequence (Qn: n = 


16.9. GENERAL TRIANGULAR ARRAYS 319 


1,2,...) converges to the infinitely divisible distribution characterized by the 
triple (j,0,v), as desired. 

To prove the necessity of (16.36), (16.37), and (16.38), we introduce the nota- 
tion T corresponding to any distribution T: T is the symmetrization of T about 
0. Suppose that lim Qn = Q for some Q. Then since Rn = On for all n, 

lim Ra = = dim 0,=0. 

NOOO 
We will show that the convergence of the sequence (Rn: n = 1,2,...) and the 
uan condition imply that the sequence (Rn: n = 1,2,...) has a convergent sub- 
sequence (Rp: Mm = 1,2,...). Assuming for the moment that this claim is true, 
the proof is completed as follows. By the Centered Case Lemma and Lemma 29, 
(16.36) and (16.37) hold for the corresponding subsequence (Qy,,: M = 1,2,...) 
for some Lévy measure v and positive number ao. By the Convergence of Types 
Theorem, there exists a real number 7 such that nn, > 7 as m —> œ, so (16.38) 
also holds along the subsequence. By the part of the theorem already proved, the 
distribution Q, being the limit of the sequence (Qn : Mm = 1,2,...), is infinitely 
divisible and determines the triple (7, 0, v) via the Lévy-Khinchin Representation 
Theorem. The same argument applies to any subsequence, so (16.36), (16.37), 
and (16.38) hold along the full sequence. 

Thus to complete the proof, it remains to show that the sequence (Ry: n = 
1,2,...) has a convergent subsequence. For each k and n, let k,n be the char- 
acteristic function of Ryn. By the continuity of the exponential function and 
Theorem 13 of Chapter 14, it is enough to prove that the sequence of functions 


(Slog oni n= 1,2)... ] 


k=1 


is equicontinuous at 0. In order to do so, we will prove equicontinuity separately 
for the real and imaginary parts. 

We begin with the equicontinuity of the sequence of real parts. Since |G4.n|? 
is the characteristic function of Ry n and the sequence (Rn ) converges, Proposi- 
tion 19 implies that the sequence of functions 


(> log o((Real ons 1,2 ) 


converges uniformly on some open interval containing the origin. As a conse- 
quence, this sequence is equicontinuous at 0. Since R(log obk n) = log o(|Gx,n|) 
for all k,n, the proof of equicontinuity at 0 of the family of real parts is complete. 

The rest of the proof concerns the equicontinuity of the family of imaginary 
parts. It is sufficient to show that 


(16.39) lim sup = [Ilog 08k, n(u))| = 0. 


320 16. INFINITELY DIVISIBLE DISTRIBUTIONS AS LIMITS 


Recall that the imaginary part of the logarithm of a point z in the complex 
plane equals one of the values of arg(z), a polar coordinates angle of z that is 
determined only up to constant multiples of 27. Lemma 20 implies that for large 
n, the complex number Bk n(u) stays close to 1 for u € [-1,1] and k=1,...,n 
so arg(Gx,n(u)) stays close to 0. Furthermore, because Bk n(u) stays close to 1 
for large n, we may choose l sufficiently large so that 


| arg(Gx,n(u))| < 2|F(Fen(u))| 
for u € [-1, 1], n > l, and k =1,...,n. Thus 
|F(log 08k n(u))| < 2|F (kn (u))| for u € [-1,1],n >l, andk=1,...,n 


We have justified replacing the imaginary part of the logarithm of k,n by the 
imaginary part of k,n in (16.39). Thus, since f x(x) Ren(dx) = 0 for all k,n, 
it is enough to show that 


n 


faxa) — sin ux) Rk n(dz)| = 0. 
R 


For |z| < 1, the absolute value of each integrand is bounded above by |uz|’, 
which is bounded above by |u|?z?. For |z| > 1, an upper bound is |u| + |uz| A1, 
which is bounded above by 2(|uz| A 1) for small u. So we will be done once we 
prove the following two statements: 


(16.40) lim sup 3) (juz| A1) Rk,n(dz) = 0; 
u—-0 n |>1 
and 
(16.41) sup) f a? Rk n(dz) < œ. 
per v=] 


To prove (16.40), we use the relation 
Rien co) Rin(-3; >) < Ren(x -5 Ł,00), 


which holds for all v € R. The uan condition implies that Rk, n(—3, ,3) > 5 for 


sufficiently large n and k = 1,...,n. For such n, we have 
n n pS 
(16.42) XO Ren(z,00) <29. Rin(w@— 4,00), zER. 
k=1 


Since the sequence of symmetrized distributions (R„) converges, (16.29) implies 
that 


imsup Yo Bale $: 00) < v[x — 5, 00) < œ for z > żŁ 


noo 


16.9. GENERAL TRIANGULAR ARRAYS 321 


where v is some Lévy measure. Similar facts hold for intervals of the form 
(—oo, —2). It then follows from (16.42) and the Continuity of Measure Theorem 
that 


(16.43) sup X Ren((-1, 1J) < œ 
n k=1 
and 


jim TO =w a = 


The statement in (16.40) now follows easily. 

To prove (16.41), we again make a comparison with the array (Ren). Since 
Tea is the distribution of the difference of two independent random variables, 
each having distribution Rķk,n, the Fubini Theorem implies that 


J r? Ry n(dz) = J f (z — y)? Rg ndx) Rk n(dy) 
[—1,1] [—1,1] ¥[—1,1] 


2 
= 2 | a” Rk n(dz)— 2 (J 2 Rin(da) ) l 
(=i1] Sii] 


Since the symmetrized sequence (R,,) converges, (16.31) applies to the measures 
Rin, SO 


? 


sup > / x? Ryn (dx) < œ. 


Thus, in order to prove (16.41), it suffices to show that 


n 


sup > [ £z Rk nldz) < œ. 
[-1,1] 


e k=1 


But this last statement follows from (16.43) and the fact that f x(x) Rk n (dz) = 
0. O 


Problem 67. Apply Theorem 30 to the triangular array (Qzk,n) where Qk,n is the 
delta distribution at (—1)*/n'/°. Show that for this example, (16.37) cannot be 
replaced by a simpler condition only involving Te z? Qz.n(dx). Also, explain 
why this example indicates that there was no chance of directly proving Theorem 30 
using the first term (or even the first two terms) of the series for the logarithm. 


* Problem 68. Apply Theorem 30 to the triangular array (Qkn:1 < k < n = 
1,2,...) of distributions, where Qk,n is the uniform distribution on the interval 


[0, zez] if k is odd and the uniform distribution on Aa 0] if k is even. 


322 


16. INFINITELY DIVISIBLE DISTRIBUTIONS AS LIMITS 


Problem 69. Apply Theorem 30 to the triangular array of distributions given by 
1 1 
n = (1 — —)ói kn- =À, 
Qin = (1 = =) 5(-ien-2) + = 


where ôa denotes the delta distribution at a and A denotes Lebesgue measure on 


(0,1). 


Problem 70. Let (Yk: n = 1,2,...) be a sequence of independent random vari- 
ables having 0 mean and finite variance. Denote the variance of Yp by o? and its 
distribution by Qk. Consider the triangular array defined by 


Y; 
Xin, 1<k<n=1,2,..., 


where Sn = ‘oa of ) ‘/? Notice that the triangular array is independent within 
rows and that each row sum Sn has mean 0 and variance 1. Assume that 
2 
i a -1<k< 


n— oo Sn 


0. 


Prove that the triangular array is uan. Then prove the Lindeberg-Feller Theorem 
that a necessary and sufficient condition for (Sn: n = 1,2,...) to converge to a 
normal random variable having mean 0 and variance 1 is that 


(16.45) lim >: f r? Qy(dz) =0 


— 
9 E Sn k=1 |z|>tsn 


for every t > 0. (Comment: (16.45) implies (16.44).) Also, apply the Lindeberg- 
Feller Theorem to Problem 68. 


Problem 71. Deduce Theorem 22 and Theorem 25 as corollaries of Theorem 30. 


Problem 72. Generalize Problem 48. 


CHAPTER 17 
Stable Distributions as Limits 


In Chapter 15, it was seen that if X1, X2,..., is an iid sequence of R-valued 
random variables with finite mean and variance, then as a consequence of the 
Classical Central Limit Theorem, quantities like 


(17.1) Pla < Xrti Xn <y] 


can be estimated using the normal distribution function. It turns out that for 
many iid sequences without finite variances, or even without finite means, such a 
quantity can still be estimated using stable distribution functions. For instance, 
useful information can be obtained about the distribution of the n™ return to 0 
of a simple symmetric random walk on Z (see Problem 20). 

After introducing some technical preliminaries in the first section, we char- 
acterize the stable distributions. Since every stable distribution is infinitely 
divisible, this characterization is given in terms of the characteristic exponent in 
the Lévy-Khinchin Representation Theorem of Chapter 16. 

In the third and fourth sections of the chapter, we state and prove the results 
that are needed for making estimates of quantities like (17.1). As with the 
Classical Central Limit Theorem, such estimates are most useful when n is large 
and x and y are chosen in a manner that depends on n so one recasts the problem 
of estimating (17.1) as a problem involving a limit of the form 


lim Plz < See) 


noo An 


<y], 


where S, = Xı +--+ Xn. In the case of the Classical Central Limit Theorem, 
one chooses an = yn Var(Xı) and bn = nE(Xı). The results given in the 
present chapter include formulas for the appropriate choices of a, and bn. They 
also tell us the domain of attraction of each stable distribution. In other words, 
they tell us, in terms of the distribution of X,, whether or not a given stable 
distribution is appropriate for estimating (17.1). 


324 17. STABLE DISTRIBUTIONS AS LIMITS 


17.1. Regular variation 


This section, which itself contains no probability, consists of tools useful for 
identifying all stable distributions and finding their domains of attraction. There 
may be some discrepancy between what is done here and what might be done in 
a full treatment of regular variation. Here we strive for only as much generality 
as is needed. 


Definition 1. A function g: (c,oo) — (0,00), for some c, is said to vary 
regularly or to be of regular variation at oo if there is a (0, 0o)-valued function 
f for which 


. glwy) _ 
(17.2) jim [ca fly), 


for all but countably many y € (0, 00). 


It is easy to extend this definition to other settings, such as regularly varying 
at —oo or regularly varying from the left at 0. We will focus here on regular 
variation at co for two reasons: (i) it is the concept needed for this chapter and 
(ii) results for the other types of regular variation can then be obtained by a 
simple change of variables. 


Lemma 2. If g is monotone and regularly varying at co and f is as in Def- 
inition 1, then the domain of f can be extended to (0,00) so that (17.2) holds 
for every y € (0,00). With this extended domain, f is monotone and continuous 
and satisfies 


fizy) = f(a)fy), 2>0,y>0. 


PROOF. The domain of f consists of a subset of (0,00) having countable 
(conceivably empty) complement. The monotonicity of g is obviously inherited 
by f on its domain. If xz, y, and ry are in the domain of f, then 


_ „ glway) 
f(zy) = a g(w) 
glws) | lim g((we)y) = f(x)f(y). 


ware g(w) wee g(wa) 


Fix z > 0. Suppose that f(x—) > f(a+). By (17.3), f((zy)—) > f((zy)+) 
for all y in the domain of f for which f(y) > 0. Since a monotone function cannot 
have uncountably many points of discontinuity, the assumption f(x—) > f(x+) 
is false. A similar argument shows f(xz—) < f(x+) to be false. Thus f is 
continuous at x if defined there and, if f is not defined at x, its domain can be 
extended to z so that it is continuous there. Now that the domain of f has been 
extended to all of (0, 00) so that f is continuous and monotone, we deduce, from 
the monotonicity of g and f, that (17.2) and therefore (17.3) hold for all x and 
yin (0,œ). O 


(17.3) 


17.1. REGULAR VARIATION 325 


The conditions in the preceding lemma can be written in terms of the function 
h = logof o exp. Monotonicity and continuity are inherited by h from f, and h 
satisfies 


(17.4) h(s +t) =A(s)+A(t), sER,tER. 


Proposition 3. Suppose that g is monotone and regularly varying at oo and 
that f is given by (17.2). Then for some constant K, f(y) = y", y € (0,00). 


Problem 1. Prove the preceding proposition. Hint: Prove that a continuous mono- 
tone function h that satisfies (17.4) necessarily equals the function s ~~ as for some 
constant a. 


The exponent « in the preceding proposition is called the index of regular 
variation of the regularly varying function g. Regularly varying functions of 
index 0 are said to be slowly varying. 


Problem 2. Show that the function log® is slowly varying at oo for every real 
exponent c. 


Problem 3. For which real c is the function exp o(log*) slowly varying at oo? For 
which other values of c is it regularly varying at oo and in those cases what is the 
index of regular variation? 


The next result says that monotone functions that vary regularly with index 
k behave somewhat similarly to power functions with exponent kK. 


Lemma 4. Suppose that g is monotone and regularly varying at oo with index 
k. Then, for any e€ > 0, 


g(w) =o(w"t®) and w*-* =o(g(w)) aswoo. 


PROOF. We need only prove the lemma for g increasing, since the result for 
decreasing g can be obtained immediately from the result for the increasing 
function = 

Let € > 0, and choose ô € (0,¢). By the definition of k we may fix z > 0 so 
that 


(17.5) gr-6 < ICY) < ont 


gly) 


for y > x. For each w > z there is a positive integer m such that 22™-! < w < 
x2™, From (17.5) we obtain 


g(w) — _g(e2™) (2°) g(a) _ 2*** 9(@) 5 mie) 
were T (x2m-1)s+e = (r2m-1)s+e = gkte ? 


326 17. STABLE DISTRIBUTIONS AS LIMITS 


which approaches 0 as y and therefore m approach oo. We also obtain 


g(w) 5 g(a2™") (28° y™"* g(a) _ _ gla) 3(m—1)(e-6) 


were 7 (a2 E= = (a2 jee (2x)K-€ 


which approaches co as y approaches oo. 0O 


17.2. The stable distributions 


Stable distributions were defined in Chapter 14 and we have already observed 
that every stable distribution is infinitely divisible. The issues we address in this 
section are those of identifying which infinitely divisible distributions are stable, 
in terms of their Lévy-Khinchin representations, and explicitly calculating their 
characteristic exponents. In the remaining two sections of the chapter we will 
focus on identifying domains of attraction and strict attraction, as defined in the 
last section of Chapter 15, but some preliminary work in that direction is in this 
section. 

Let (n,o, v) be the characterizing triple, via the Lévy-Khinchin Representation 
Theorem, of a nondegenerate stable distribution Q on R. To obtain conditions 
on (7,0,v) that are consequences of stability, we let R be any distribution in the 
domain of attraction of Q. (R = Q would do for this section, but general R is 
needed for the next one.) Thus, there exist constants a, € (0,00) and bn E€ R 
such that 


lim R*(anB + bn) + Q(B) 


for all Borel B for which Q(OB) = 0. It will be convenient throughout the 
remainder of this chapter to set €n = b,/n and to introduce distributions Qn 
defined by 


(17.6) Qn(B) = R(anB+en), B Borel. 


Note that Q*"(B) = R*"(a,B + bn) for Borel sets B. Thus, we search here for 
consequences of the assumption Q*” => Q as n > oo. 


Lemma 5. If Q is nondegenerate and Q” — Q as n > œ, then an > œ, 
Iati +1, and = +0 as n —> oo. 


Problem 4. Prove the preceding lemma. Hint: Use the last sentence of Theorem 22 
of Chapter 16. 


Continuing under the assumptions that Q is characterized by the triple (7, a, v) 
and that Q” — Q, we use Corollary 23 of Chapter 16 to obtain 


CO 


(17.7) lim nf P? Qnldt) Sa" -f t? v(dt) 
NRR [-yy] [-y,y]\ {0} 


17.2. THE STABLE DISTRIBUTIONS 327 


for those y > 0 for which v{—y,y} = 0. We make the change of variables 
t = (s — Cy) /an in the left side of (17.7) and then use the third conclusion in 
Lemma 5 in conjunction with v{—y, y} = 0 to obtain 


(17.8) lim | l s? R(ds) = 0? + | t? v(dt). 
—~AnY;,Qny 


E [—y.u]\{O} 


Fix z > 0 such that v{—z,z} = 0 and 


a+ | t? v(dt) > 0. 
[-2.2]\{0} 


For y > 0 such that v{—yz,yz} = 0, (17.8) and the second conclusion of 
Lemma 5 yield 
s? R(ds) o°? + fi- 


Hri Ji- anyYzZ,anyz] = yz,yz]\{o} i v (dt) 
RCO Ji- a an +12] s? R(ds) g? + Ji- zz ]\{0} t? v (dt) 


and 
s? R(ds) o Da. 


ae eee = yz,yz] KOM v(dt) l 
n-¥00 S s? R(ds) g? + fi, a {0} tu (dt) 


By the first conclusion of Lemma 5, every sufficiently large w lies in [an, an41] 
for some n. Hence, 


Ji- a s? R(ds) _ o Re P ot v(dt) 


(17.9) lim 
s?R(ds) a2 + Sis BNOS v(dt) 


w—oo fi- 
wz,wz] 


Therefore, the function 


yr f s° R(ds) 
[-yz,yz] 


is regularly varying at oo and, by Proposition 3, 
a eee oud Not v(dt) Ea 


i Tt aa 


for some x > 0. Moreover, (17.9) holds for every z > 0. 

By letting z N 0 in (17.10) we see that ø > 0 implies that x = 0 and hence 
that v is the 0 measure. In this case, a look at the characteristic function reveals 
that Q is a normal distribution. 

For the remainder of this argument, assume o = 0. Set z = 1 to obtain 


(17.11) / t? v(dt) = ày" 
[—y,y]\{0} 


for some constant > 0. If v(0,00) > 0, we may repeat the preceding arguments 
with the lower endpoints of integration replaced by 0; if v(—oo,0) > 0 they can 
be repeated with the upper endpoints of integration replaced by 0. The result 


328 17. STABLE DISTRIBUTIONS AS LIMITS 


is that there exist positive constants kt and «~ and nonnegative constants At 
and A~ such that 


(17.12) / t? v(dt) = Aty" 
(0,y] 

and 

(17.13) J t? v(dt) = Ay" . 
[—y,0) 


The value of «* is arbitrary if At = 0; we choose kt = «x in this case. Similarly, 
we set K =k if A7 =0. With these conventions it is straightforward to use the 
fact that (17.11), (17.12), and (17.13) hold for all y > 0 to conclude first that 
Kt = «7 = k and A = At + à`, and second that x < 2 (so that v is a Lévy 
measure). We may further conclude that v is given by 


ATK —(2—k) 
(17.14) yy, o0) = Sap » y> 0 
and 

A” K 


(17.15) u(—00, 2) = 5 [eT oe <0. 


We have proved the following proposition. 


Lemma 6. In order that an infinitely divisible distribution corresponding to 
the triple (n,0,v) be a nondegenerate stable distribution it is necessary that ez- 
actly one of the following two conditions be satisfied: (i) v = 0; (ii) o = 0 and 
v is given by (17.14) and (17.15) for some constants x € (0,2), AT > 0, and 
AT > 0, with AT +A7 > 0. 


In the course of this section it will develop that ‘necessary’ in the preceding 
proposition can be replaced by ‘necessary and sufficient’. For the Rt -setting, 
the full story is immediately available. 


Problem 5. Use Problem 13 of Chapter 15, the Lévy-Khinchin Representation 
Theorem for R*, and Lemma 6 to conclude that the stable moment generating 
functions are those of the form 


psig ee 
for some a € (0, 1], c € [0, 00), and € € [0, 00), and that of these, the strictly stable 
ones are those with € = 0. [Note: the correspondence as described is not one-to- 
one, so for instance, the strictly stable moment generating function obtained by 
setting € = 0, c = 1, and a = 1 can also be obtained by setting € = 1, c= 0, and 
choosing a arbitrarily.] 


17.2. THE STABLE DISTRIBUTIONS 329 


In view of (17.14) and (17.15), the change of variables a = 2 — « is natu- 
ral. With this definition of a, the stable distributions described in case (ii) of 
Lemma 6 are said to have index a, necessarily belonging to (0,2). A stable dis- 
tribution for which o > 0 (that is, a nondegenerate normal distribution) is said 
to have index 2. The index of the delta distribution at 0 is not defined. Other 
delta distributions also have no index in the stable setting, but have index 1 in 
the strictly stable setting. 

Lemma 6 identifies all the stable distributions in terms of the corresponding 
triples (7,0,v) in the Lévy-Khinchin Representation Theorem. In the remain- 
der of this section we will explicitly calculate the corresponding characteristic 
exponents and also identify which of these exponents are strictly stable. Before 
proceeding, however, we record, for use in the next section, necessary conditions 
for R to be in the domain of attraction of a stable distribution of a particu- 
lar index, conditions that are contained explicitly or implicitly in the discussion 
leading to Lemma 6. 


Lemma 7. If a distribution R is in the domain of attraction of a stable dis- 
tribution distribution of index a, then the function 


(17.16) yr / s? R(ds) 
[-y,y] 


is regularly varying at œ of index 2 — a and, in case a F 2, 


foy? R(ds) àt 


y] $° R(ds) a 


and 


Problem 6. Identify those aspects of the discussion leading to Lemma 6 that con- 
stitute a proof of the preceding lemma. 


Problem 7. Prove the following relationships between the constants A*, À, and the 
Lévy measure v of a stable distribution: 


v(c, 00) à and v(—œ, —c) OA 
v(—oo,c) +v(c,oo) À y(—oo,—c) + v(e o) A 


for c > 0. 


Problem 8. Let 8 > a € (0,2) and suppose that the function defined at (17.16) 
is regularly varying at oo of index 2 — a. Show that f |s|° R(ds) = œo. Hint: Use 
Lemma 4. 


* Problem 9. Let 0 < 8 < a € (0, 2] and suppose that the function defined at (17.16) 
is regularly varying at oo of index 2 — a. Show that f |s|? R(ds) < co. Hint: Use 
Lemma 4. 


330 17. STABLE DISTRIBUTIONS AS LIMITS 


Problem 10. Let a € (0,2). Show that the assumption that the function defined 
at (17.16) is regularly varying at oo of index 2 — a does not enable one to draw a 
conclusion about the finiteness of f |s|* R(ds). 


Problem 11. Let @ > 2. Show that the assumption that the function defined at 
(17.16) is slowly varying at co does not enable one to draw a conclusion about the 
finiteness of f |s|? R(ds). 


The remainder of this section is devoted to the explicit calculation of the 
stable characteristic exponents. We already mentioned the case a = 2 in the 
preceding discussion, but for the record, we make a formal statement here. This 
result is also found in Problem 9 of Chapter 16. 


Theorem 8. The stable distributions of index 2 are the normal distributions. 
The characteristic exponents are of the form —inu + (o7/2)u?, 7 € R,o? > 0. 
Of these, the strictly stable distributions are those with mean n = 0. 


Now we focus on the remaining values of a. In view of (17.14) and (17.15), 
the density with respect to Lebesgue measure of the Lévy measure of a stable 
distribution with index a € (0,2) is of the form 


y ~ yl TY (kt Io) ly) +B I-00) (9)) 5 


where kt,k™ are nonnegative constants. Since the characteristic exponent is 
given by 


J (1 — cosuy — isin uy + iux(y)) v(dy), 
R\ {0} 
the following formula is of interest: 
oo S e 
(1 —cosuy)y (°F? dy = jul? | aT dw 


_ wesc ae 
~ 20 (a +1) 


(17.17) 


jul”, 


where I denotes the gamma function. 


Problem 12. Check the first equality in (17.17) by using substitution and the sec- 
ond by using tables and whatever calculations are necessary in conjunction with the 
tables. Hint: It may be that the tables force consideration of three cases separately: 
a € (0,1), a = 1, and a € (1, 2). 


For 0 < a < 1, the integral of the —i sin uy term is finite. It can be calculated 
by replacing a by a + 1 in (17.17) and differentiating with respect to u: 


im sec ms 


Tar)! sgn(u) , 


CO 
(17.18) / (—isin uy) y~ T” dy = — 
0 


17.2. THE STABLE DISTRIBUTIONS 331 


valid for 0 < a < 1. In this case, the integral of the iux(y) term is also finite 
and can thus be included with the —inu part of the characteristic exponent to 
give a term —if€u for some € € R. 

For 1 <a < 2, the integral of the —i sin uy term is not finite; the iuy(y) term 
must be included. To obtain a nice formula, it is convenient to add and subtract 
iuy, and then to split the integral into two pieces. The formula for the first piece 
is 


im sec 22 


—— 2 fy, | 


(17.19) / i(—sinuy + uy) y(t) dy = 
0 

valid for 1 < a < 2. This formula is proved by replacing a by a—1 in (17.17) and 
integrating with respect to u. Note that the right sides of (17.18) and (17.19) 
are identical. The second piece is the integral of iu(x(y) — y) with respect to v. 
It gives a finite multiple of iu and can thus be included with the —inu term to 
give a term —7€u for some £ € R. 

If we combine (17.17), (17.18), and (17.19), and do a little rearranging of 
constants, we obtain the following theorem. 


Theorem 9. The stable characteristic exponents of index a € (0,1) U (1,2) 
are the functions of the form 


u ~ —i€u + klu|* (1 — iy(tan 32) sgn(u)) , 


where k > 0, € € R, and |y| < 1. Of these, the strictly stable characteristic 
exponents are those for which € = 0. 


Problem 13. Prove the preceding theorem. Hint: Formulas (17.17), (17.18), and 
(17.19) involve integrals over (0,00). Similar formulas hold for the region (—oo, 0). 
The constant y comes from the fact that the constants in the Lévy measure may 
be different on these two regions. 


For the index a = 1, the following integral is relevant: 


° —sinuy +u °° — sinz + 
| v- XY) y= uf talx (e/lu)) dz 
0 y 0 Z 
°° — sinz + y(z ~ — 
0 z 0 z 

the splitting of the integral into two pieces being valid because each of the pieces 
is finite. The first of the two pieces, when multiplied by i, can be combined with 
—inu to give a term —igu for some € € R. The second piece can be evaluated 
explicitly; it equals ulog |u|, which we take to equal 0 at u = 0. 


Theorem 10. The stable characteristic exponents of index 1 are the functions 
of the form 


u ~ —i€ut k|u|(1 + iy(ž log |u|) sgn(u)) 


332 17. STABLE DISTRIBUTIONS AS LIMITS 


where k > 0, € € R, and |y| < 1. Of these, the strictly stable characteristic 
exponents are those for which y = 0. Moreover, those for which k = 0 and € £0 
are also strictly stable of indez 1. 


Problem 14. Prove the preceding theorem. 


In Theorem 9 and Theorem 10, parameters k, y, and € appear. The proofs of 
these theorems show the following relationships of these parameters to (n, 0, v). 


Proposition 11. For a stable distribution as described in Theorem 9 or The- 
orem 10, let (7,0,v) be the triple for the corresponding Lévy-Khinchin represen- 
tation. Then k, y, and £, of those theorems, are given by 

k = T mee 00) + v(--00, -1)) ; 
_ v1, 00) — v(—00, -1) © 
~ p(1,00) + v(—00, -1)’ 


SR coy XY ) v(dy) if a € (0,1) 


€=n- 4 fhol) —sinylv(dy) ifa=1 
Saygoylxy) — yl v(dy) ifaw € (1,2) 
_ fete if a € (0,1) U (1,2) 
í (v(1,œ) — v(—00, -1)) fr. x(y)—siny eee dy ifa=1. 


* Problem 15. What is the index of a Cauchy distribution that is symmetric about 
0? 


Problem 16. Show that each strictly stable distribution different from a delta dis- 
tribution has a continuous density with respect to Lebesgue measure and calculate 
the value at 0 of that density. 


* Problem 17. For each strictly stable distribution Q calculate Q(0, 00). For each 
fixed index @ calculate the maximum value of Q(0,0o) as the values of the other 
parameters vary. Hint: Use an inversion formula to find an expression for Q(0, x). 


17.3. + Domains of attraction 


The following theorem identifies the domains of attraction for nondegenerate 
stable distributions. Only one distribution from each stable type is mentioned 
since the domains of attraction of all stable distributions of any one type are 


17.3. DOMAINS OF ATTRACTION 333 


identical. At some place or other in the theorem a constant, depending on a, 

must appear. For that purpose we set 
m(2—a) csc mam 

(17.20) k= Waray tae (0,2) 


t ifa = 2, 


where I denotes the gamma function. The theorem also gives formulas for 
relevant normalizing constants. When reading the theorem it should be noticed 
that the constant y is irrelevant when a = 2. 


Theorem 12. Let a € (0,2], y € [-1,1], and Q be the stable distribution 
whose characteristic exponent is the function 


z kaļu|® (1 — iy(tan 5%) sgn(u)) ifa#1 
ky [ul(1+ (2 log|ul) sen(u)) ifa=1, 


where ka is defined by (17.20). Then a distribution R is in the domain of at- 
traction of Q if and only if the function 


(17.21) yr s* R(ds) 
[—y,y] 


is regularly varying at œ of index 2 — a and, in case a # 2, 


y] 
(17.22 ime ee 
) yoo = s* R(ds) 2 


If R is in the domain of attraction of Q, then limn+.oQ7*" = Q, where Qn is 
defined by Qn(B) = R(anB+ cn) with 


1 1 
(17.23) Qn = infyy > 0: =f s$? R(ds) < — 
Y? S—y,yl a 
and 
0 ifa<l 
(17.24) Cn = $ an f sin & R(ds) ifa=1 


aa s R(ds) ifa>1. 


PARTIAL PROOF. The necessity of the regular variation condition is the con- 
tent of Lemma 7, and the necessity of (17.22) follows immediately from Propo- 
sition 11 and Lemma 7. So, for the remainder of the proof we assume that the 
function (17.21) is regularly varying at oo of some index 2 — a with a € (0,2) 
and, in case a # 2, that (17.22) holds for some y. 

For the purposes of obtaining information about the constants a, and cy 
defined in the statement of the theorem, let us consider the properties of the 
function 


334 17. STABLE DISTRIBUTIONS AS LIMITS 


It is easy to see that this function is right continuous and has limits from the 
left at each point y. Furthermore, the limit from the left at y is always less than 
or equal to the value at y. By Lemma 4 the function converges to 0 as y — oo. 
It follows from these facts that a, ~ œ as n / œ, and that for all sufficiently 
large n, 


(17.25) =| s$ R(ds) = 1. 
an J[—an,an] 

By Problem 9 the definition of c, is meaningful. Also, 3 > 0 as n > oo, a fact 
that will be used below. We will finish the proof by verifying (16.22), (16.23), 
and (16.24) in Corollary 23 of Chapter 16, thereby establishing that Q” => Q 
as n — oo (where Qn is defined in terms of R, an, and cp as in the statement of 
the theorem). 

With the goal of proving (16.22) we first fix y > 0 and, for all sufficiently 
large n (depending on y), we use (17.25) to obtain 


NO n(Y; 00) = nR(any + Cn, oo) 


< nR((any/2), œ) 


oe) 
= ` nR(any2” *, any2™| 
m=0 


oo 
4n 5 
2 s? R(ds) 
E (any2™)? [O,any2™] 


=F 4 oasen? Ride) 
(y2™)? Siaa] R(ds) 


(17.26) -5o Jio,an y] $ *R(ds) m Toe: y2i] $ 2 R(ds) 
i Aa (y2™)? fi uar * R(ds) +- diia jas 2 R(ds) ` 


By the regular variation hypothesis, the factors in the product [];~, in (17.26) 
approach 2?—° uniformly in i, 1 < i < œ, as n —> œ. Therefore, for fixed 
B € (0,a) and all sufficiently large n, not depending on m, this product is less 
than 2?-9)™, Thus, (17.26) is bounded above by 


sea 2R(ds) & 4 Joea y] $ 2 R(ds) i 4 


a or «i ee es a ae 
( ) fi-a a 8? R(ds) 4 y22em Ji sasin s? R(ds)” 1-2-4 


17.3. DOMAINS OF ATTRACTION 335 


for sufficiently large n. As n — oo the regular variation condition and (17.22) 
give 


s* R(ds) 


[~an An] 


2 2 
(17.28) Joan RAS) Si-any any] RCS) 


= lim = 
n= oo TEE s? R(ds) TERE s? R(ds) 
tr 1 T F 2—a 


This calculation shows that the quantity in (17.27) goes to 0 as first n —> oo and 
then y — oo. Since a similar argument works for n@,,(—co,z) when z > —on, 
(16.22) holds. 

In order to prove (16.24), we first suppose that a < 2. By (17.25) and (17.28) 


n 
lim = | s? R(ds) = lim i 
n= o0 a? [o any] ( ) n= Jaaa s? R(ds) 
1 PI 2— a 
— 2 


A similar argument for intervals [anz, 0] gives =o Thus, 


lim — s? R(ds) 
(17.29) P On er 
== sen(y) a Yjyj2-2 = sen(z) E Tg \2-4 
2 2 i 


for all z < y. A slightly different, somewhat simpler argument shows (17.29) to 
be valid for a = 2 provided that neither x nor y equals 0. Using Proposition 11 
and a little straightforward calculation to replace the right side of (17.29) by an 
expression in terms of g and v gives 


n 
lim F 8° R(ds) = 07 ôo (|z, y]) +/ t? v(dt) 
M00 OF Jian tsany] [x.y]\{0} 


where 69 denotes the delta distribution at 0. Since ĉa — 0, the substitution 
S = ant + Cn gives 


(17.30) lim n f t? Qn(dt) = 0760([z, y]) + t? v(dt) 
n> Jizy] [z,y]\ {0} 


provided that neither x nor y equals 0. Thus (16.24) is satisfied. 
Before continuing with the proof, it will be useful to mention one consequence 
of (17.30). Fix an interval [z, y]. Then 


Bon | t? Qn (dt) 
B 


336 17. STABLE DISTRIBUTIONS AS LIMITS 


defines a finite measure un on the Borel subsets of [z, y]. It is an easy exercise 
to extend the notion of convergence in distribution to the sequence (fin: n = 
1,2,...), since sup, #n[z, y] is finite. We may conclude from (17.30) that if 
f: [x,y] > R is continuous and neither z nor y is 0, then 


(17.31) mn FO) Qnldt) = / f (t)t? v(dt). 


li 
n> J [zy] [z,y]\{0} 
We will use this fact several times in the rest of the proof. 
It remains to consider (16.23) of Chapter 16. We first treat a € (0,1) in which 
case n = f xdv, by Proposition 11. Thus, (16.23) is equivalent to 


n f xdQn | xdv. 


In view of the boundedness of the function y and the fact that (16.22) has already 
been proved, it is enough to prove that 


7 J x(t) Qn(at) > | x(t) v(dt) 
[zy] ] 


[zy 
for all real z < y. If 0 ¢ [x,y], then we may multiply and divide by t? and apply 
(17.31) to see that 


nf x¥dQn=n aM) 4? Qn(dt) > Xg v(dt) =} xdv. 
[zy] [zy] x,y] [x,y] 


[z, 


We have used the fact that t ~~ x(t)/t? is a continuous function on such an 
interval. To take care of intervals [x,y] containing 0, it is enough to prove that 


YN n> 


(17.32) lim lim sup (af tQn(dt)) s[i, 
[-yy] 


The proof of this fact is carried out in two parts, one part for the interval [0, y] 
and the other for the interval [—y,0]. The arguments are similar for the two 
parts, so we concentrate on the interval [0, y]. Written in terms of R, we need 
to prove 


(17.33) lim lim sup ae sR(ds)=0. 


YNO n>co An J[0,any] 
For the proof of (17.33) the following equalities and inequalities are useful: 


= s R(ds) = 3 — | s R(ds) 
m=0 ” 


Qn J[0,any] any2—(™+1) a, y2-™] 
oO n2(m+1) 


< 5 J s° R(ds) 
m=0 any [O,any2-™] 
_ 2041) fig ayo] s* R(ds) 
y Jio,an] s2? R(ds) 


17.3. DOMAINS OF ATTRACTION 337 


It is left for the reader to now prove (17.33). 
Next we consider a € (1, 2]. In this case n = [[x(t)—t] (dt) by Proposition 11. 
Thus, (16.23) is equivalent to 


n fx) Qnat) > f KO- van. 

R\{0} 
The mean of Qn is 0, so we obtain the following equivalent statement as the one 
we need to prove: 


n JKO -dantan > f KO- ran. 


As in the previous case, we multiply and divide by t?. The function t ~ xet, 
defined at 0 by continuity, is continuous on every bounded interval. It follows 
from (17.31) that 


nf bxt)- 4 Qnldt) -> xC) ~ v(t) 

[zy] [x,4]\{O} 

for real numbers x,y not equal to 0. To treat the behavior near +oo, it suffices 
to show that 


(17.34) lim lim supn f x(t) — t] On (dt) = 0; 
YZ n-00 (y,00) 

and a similar assertion near —oo. It is left for the reader to prove (17.34) by 
a method similar to the proof above that (16.22) holds, and to observe that a 
similar argument would also work near —oo. 

Finally we treat (16.23) for the case a = 1. In view of Proposition 11 we want 
to prove that 

n | x(t) Qn(at) f (x(t) sint] (at), 
R\{0} 

which we rewrite as 


n [hee — sint]Q,,(dt) + n f sint Qu(dt) 
(17.35) 
> [x(t) — sin t] v(dt) . 
R\{0} 


Since t ~ xl aint defined at 0 by continuity, is a continuous function on every 
bounded interval, we may once again apply (17.31) to obtain 


j I Omer E wssn 
a wao 


for every finite x and y. Since t ~ x(t) — sint is a bounded function, (16.22) 
implies that [z, y] can be replaced by |x, o0) or (—oo, y] in this equation. 


338 17. STABLE DISTRIBUTIONS AS LIMITS 


To show n f sintQ,(dt) > 0, thus finishing the proof, we change variables; 
the quantity we want to study is 


n J [sin | R(ds) , 


which equals 


(17.36) n [cos =z f [sin Z] R(ds) — n[sin 2] f [cos 2] R(ds). 


We Approximate cos + by 1, sin by 4, sin by @ = f sin + R(ds), and 
cos = by 1. If ee Approeimations were ae (17. 36) waa equal 0. It 
is left for the reader to use Problem 18 below and the fact that the function 
(17.21) is regularly varying of index 1 to show that the error arising from such 
approximations goes to 0 as n > œ. O 


Problem 18. Suppose that R is in the domain of attraction of a stable distribution 
of index a. Let an be defined by (17.23). Prove that 


an = 0o(n™ 7°) and n'/(*9) = ofan) as n> 00 
for every € > 0. 


Problem 19. Some steps in the proof of Theorem 12 have been explicitly left for 
the reader. Supply these missing steps, possibly using the preceding problem. 


Example 1. Let us apply Theorem 12 to a distribution R supported by 
[0, 00) for which 


(17.37) lim 2!/? (logz)~! R(z,œ) = 1. 
T—> OO 
The integral 


y 
| s* R(ds) -f s° dR(s, 00) 
[0,y] 0 
y 
-y’Rly,co) +2 f sR(s, œœ) ds 
0 


(17.38) 


is relevant for applying the theorem. Temporarily we act as if the limiting 
relation (17.37) is an equality in which case (17.38) becomes 


y 
—y3/2 log y + 2 | s\/? log sds = 1y3/? logy — 8y3/? 
0 


~ ty3/? logy as y œ> OO. 


17.3. DOMAINS OF ATTRACTION 339 


It is easy to check that the error in treating (17.37) as an equality rather than 
as a limiting relation is o(y3/? log y). Thus, 


(17.39) | a R(ds) ~ 13/2 logy aS y—- oo. 
O,y 


It is now clear that the function (17.21) is regularly varying with index ž = 
2 — 4 and that (17.22) is satisfied with y = 1. 

By Theorem 12, there exist constants an such that Q” — Q, where Qn is de- 
fined by Q,,(B) = R(anB) and Q is strictly stable of index 4 with characteristic 
function 


iD (—Frjolul?/? (a : isgn(u)) 
= exp(— 2928 lu"? (1 — isgn(u)) , 


Let € € (0,1). From (17.23) (or slightly more quickly from (17.25)) and (17.39) 
we obtain 


(17.40) 


log an 
(17.41) bepa oes ag 


= Bua. 
for all sufficiently large n. Of course, we are willing to replace (a,) by any 
sequence (ân) for which ân/an — 1 as n => oo. Indeed, we would like to find 
such numbers ân that can be written in a simple explicit form. Here is a try: 
set € = 0 in the above inequality and set the slowly varying function log equal 
to a convenient constant—say 3 —and solve to obtain ân = n?. Now drop 
the fiction that log ân is a constant and replace it by the better approximation 


2logn. Then, solve again for a, to obtain 
(17.42) Gn = An? log’ n. 
It remains to show that ân/an — 1. From (17.41) and (17.42) we obtain 


Hae leropel 


for all sufficiently large n. To finish the proof that G,/a, — 1 we need only 
show that (2logn)/(loga,) + 1. But this asymptotic relation is an immediate 
consequence of Problem 18. 

Let us summarize using random variables. Suppose that (Xn: n = 1,2,...) 
is an iid sequence of nonnegative random variables with common distribution R 
satisfying (17.37) and let Y denote a strictly stable random variable of index 5 
with characteristic function (17.40). Then 


as n — OO. 
For the special situation at hand we can give additional information about 
the limiting distribution Q. By Problem 17, it is supported by R*. To calculate 


340 17. STABLE DISTRIBUTIONS AS LIMITS 


the moment generating function of Q we use Proposition 11 to obtain the Lévy 
measure: v(x,00) = 3271/2, x > 0. Integration by parts in the Lévy-Khinchin 
representation gives a formula for the moment generating function in terms of 


the shift €: 
y ~a eTEN 


Since Q is strictly stable (as can be seen by looking at its characteristic function), 
€ = 0. Comparison with Problem 16 of Chapter 15 shows that the density of Q 
with respect to Lebesgue measure on (0, 00) is 


3 —9n/4e 
~~ 73/2 e ; 


Remark 1. Another approach to finding the moment generating function of 
Q in the preceding example is to use the substitution v = —tu, say for u > 0, in 
the formula for the characteristic function: 


v ~ exp( -58250A — 4). 


The simplest way to justify this change of variables is to treat the moment 
generating function as having domain equal to the right half of the complex 
plane and using a bit of complex function theory. Continuity considerations force 
the first-quadrant version of i!/* rather than the third-quadrant version. Thus 
qh me and the formula given above for the moment generating function 
results. 


Problem 20. Let T, be the time of the n*? return to 0 of a simple symmetric 
random walk on Z. The distribution R of Tı is given in Example 5 of Chapter 11, 
and the distribution of Tn is R*” by Theorem 12 of that same chapter. Prove that 
R is in the domain of attraction of a stable distribution of index a = L, Use this 
fact to estimate the quantity P[Tioo < 10000]. Hint: A simple change of variables 
allows one to use tables of the normal distribution function for the last part of the 
problem. 


Problem 21. Apply Theorem 12 to the symmetric distribution R given by 


R(x, œ) = (16+), «>0. 


Problem 22. Apply Theorem 12 to the distribution R given by 


R(y, œ) = 2R(—œ, —y) = (16 +y) "f, y20. 


Problem 23. Apply Theorem 12 to the distribution with density 


0 if |r| <1 
n> 
lz? if |x| > 1. 


17.3. DOMAINS OF ATTRACTION 341 


Problem 24. Obtain the Classical Central Limit Theorem as a corollary of Theo- 
rem 12. 


Problem 25. Use Theorem 12 to redo Problem 7 of Chapter 15. 


Problem 26. Apply Theorem 12 to the distribution with density 


Problem 27. Apply Theorem 12 to distributions Q supported by R* and satisfying 
r°/?Q(r,00) + 5 as £ > œ. 


Problem 28. Apply Theorem 12 to distributions R supported by (—co, 0] and sat- 
isfying 
A 1/2 
lim (|z| log |x|) R(—oo,z] = 5. 
rI—-— co 


* Problem 29. Apply Theorem 12 to the distribution with density 


if |x| < e? 
GM 
|x|~*(logx)~? if |z| > e?. 


Problem 30. Apply Theorem 12 to the symmetric distribution function F given 
by 


1 . 
F(x) = | Fidos le? Le 
2 if —e<ax<0. 


e 


Make an interesting comment involving means. 


Problem 31. Apply Theorem 12 to distributions R that are symmetric about 0 
and satisfy xt’? exp(./log z)R(z,œ0) 4 1 as £z > œ. 


Problem 32. Let R = Rı * Ro, where Rı is a standard normal distribution and R2 
is a standard symmetric Cauchy distribution. Apply Theorem 12 to R. 


In both the last chapter and this one, our goal has been to obtain conditions 
in terms of distributions because they are usually regarded as the fundamental 
data. But if one is given a transform of a distribution it is often easier, as in the 
following problem, to work with the transform rather than Theorem 12. 


Problem 33. As seen from Problem 33 of Chapter 5, the function s ~~ 1— y1- s 
is the probability generating function of a distribution R. Decide if R is in any 
domain of attraction and, if so, find both the stable type to which R is attracted 
and appropriate normalizing constants. 


342 17. STABLE DISTRIBUTIONS AS LIMITS 


17.4. t Domains of strict attraction 


In Theorem 12 only one stable distribution appeared of each stable type. For all 
a # 1 the representative stable distribution there happens to be strictly stable. 
Therefore, that theorem goes some distance towards the following result. 


Corollary 13. Let a € (0,1) U(1, 2], y € [-1,1], and Q be the strictly stable 
distribution whose characteristic exponent is the function 


u ~> kalu]? (1 ~ iy(tan 32) sgn(u)) , 


where ka is defined by (17.20). Then a distribution R is in the domain of strict 
attraction of Q if and only if the function 


y ~ J s° R(ds) 
[-y:y] 


is regularly varying at co of index 2 — a and, in case a F 2, 


Jio) s? R(ds) O l+ 
s2 R(ds) 2 


— 00 
á [—y,y] 


and, in case a > 1, the mean of R is 0. If R is in the domain of strict attraction 
of Q, then limn +. Q7" = Q, where Qn is defined by Qn(B) = R(anB) with 


zig vee 2 i 
An =inf{y >0: ale R(ds) < =}. 


Problem 34. Prove the preceding corollary. 


The next theorem treats the domains of strict attraction of the nonconstant 
strictly stable distributions of index 1. 


Theorem 14. Let € € R and denote by Q the strictly stable distribution 
whose characteristic exponent is the function 


T 
u ~ —1€u + zll- 


Then a distribution R is in the domain of attraction of Q if and only if the 
function 
y ~~ 8° R(ds) 
[—y.,y] 
is regularly varying at œ of indez 1, 


es Jio,yi s? R(ds) al 
yPo0 Siwy 8? Rds) TAr 


and 


lim — sR(ds) =£, 


n— o0 Qn [—an,an] 


17.4. DOMAINS OF STRICT ATTRACTION 343 


where 


A : 1 
An = infty > 0: ot R(ds) < ae 


If R is in the domain of strict attraction of Q, then limn540Q7*" = Q, where 
Qn is defined by Qn(B) = R(an B). 


Problem 35. Prove the preceding theorem. 


Problem 36. Let Q be as in Theorem 14. Of course, Q is in its own domain of 
strict attraction. As a check on the correctness of Theorem 14, use it to prove 
that Q is in its own domain of strict attraction and also to find appropriate scaling 
constants an. 


To complete the story concerning domains of strict attraction, it remains to 
determine the domains of strict attraction of nonzero constants. Such a result 
might be called a general ‘Law of Large Numbers’ although many would use that 
term only with scaling constants an = n, as in Theorem 2 of Chapter 15. 


Theorem 15. For an arbitrary distribution R, let 


(17.43) an = inf {y > 0: J. s R(ds)| < =| 


yy] 
(the infimum of the empty set equaling co). Then R is in the domain of attraction 
of either 6, or 0_1, the delta distributions at 1 and —1, if and only if an € (0,00) 
for all sufficiently large n, 


(17.44) lim nR((—00, —an) U (an, œ)) = 0, 

and 

(17.45) lim < s? R(ds) =0. 
n> CO a? [Sacer 


In case these conditions are satisfied either 


(17.46) | s R(ds) > 0 
[-an,an] 


for all sufficiently large n and limn+0o Q3” = 61, where Qn is defined by Qn(B) = 
R(anB), or 


(17.47) i sR(ds) <0 
[—@n an] 


for all sufficiently large n and limp. Q7" = 6-1. 


344 17. STABLE DISTRIBUTIONS AS LIMITS 


PARTIAL PROOF. Suppose that a, € (0,00) for all large n and that (17.44) 
and (17.45) hold. Clearly (16.22) and (16.24) in Corollary 23 of Chapter 16 hold 
with o = 0, v = 0, and Qn defined as in the theorem at hand. 

In order to verify (16.23) with 7 = +1, and thus complete the proof of the ‘if’ 
portion of the theorem we first study the sequence (an). Clearly this sequence 
is increasing. By the right continuity of y ~ y7’ | an s R(ds)|, 


J s R(ds) <1. 
[-an,an] 


n 


an 


Let ¢ € (0,1). Then 


n 
(17.48) cw! | s R(ds)| > 1. 
(1 = C)@n! {1 -)an.(1—O)an] | 
Use 
joa, 
|s| R(ds) 
(17.49) (1 = Clan J[an,—(1-0)en U((1-O) an san] 


n 2 
ee s“ R(ds 
< oai Jaa a) 


in conjunction with (17.45) to enlarge the domain of integration in (17.48) to 
[—an,a,] and then multiply both sides by 1 — Ç to obtain 


[ _ sRtas)|2 0-9. 
[—an,an] 


Smt gs n 
lim inf — 
NOOO An 


We let Ç N 0 to conclude 


(17.50) lim — 


n> An 


J s R(ds)| xi; 
[—an,an] 


Let £ > 0. Using a slight variation of (17.49) in conjunction with (17.50), we 
obtain 
1 1 1 
ax | s R(ds) = ral =a s R(ds), 
(1+ B)an Ji-(1+8)an,(1+8)an] 1+6 an Ji-an,an] 


uniformly for 3 € [0,€] as n > oo. We conclude that an+ı < (1 + e)an for 
sufficiently large n and then that either (17.46) holds for all sufficiently large n 
or that (17.47) holds for all sufficiently large n. 

The following calculation, using (17.44), completes the proof of (16.23): 


lim f XONA) = lim, f x(s/an) (0R) (ds) 


= lim — s R(ds). 


[—an,@n] 


For the converse, we suppose that, for some sequence @,, of positive constants, 
Q” — ô, where Q is defined by Q(B) = R(GnB), because we will omit the case 
of convergence to 6_;. Thus, the conditions in Theorem 22 and Corollary 23 of 


17.4. DOMAINS OF STRICT ATTRACTION 345 


Chapter 16 hold with ņ = 1, ø = 0, v = 0, and Q in place of Q. By (16.16) and 
(16.18) 


(17.51) a s R(ds) = nf tQ,(dt) +1. 
On J[-an,an] [-1,1] 
Replacing ân by oa: in the definition of Ên changes the value of lim Qsn to ipe- 
Hence 
i J r s R(ds)>1+e€. 
ite l- IFE Ite! 


By using a diagonalization procedure as n — oo and € N 0 we can find a new 
sequence (ån), asymptotic to (an), such that 


(17.52) = | s R(ds) = nf tQ,(dt) \,1 asn—- oo 
n J[-Gn Gn] [-1,1] 


and Q*” -+ ô where Qn is defined by Qn(B) = R(GnB). By (17.52), an < an 
(where an is defined by (17.43)). It is left for the reader to show that an < 2a, 
for all large n, so that in particular a, < ©. 

By (16.16), 


nR((—o0, —An) U (an, o0)) < nR((—oo, —Gn) U (Gn, o0)) > 0. 


By (16.24) and (16.16), 


~~ s? R(ds) 
an [—a@n,An] 
< = s? R(ds) + nR((—co, —Gn) U (Gn, 00)) > 0, 
n [—Gn än] 


thus completing the proof. O 


Problem 37. One part of the preceding proof has been explicitly left for the reader. 
Complete that part. 


* Problem 38. Let X be a Cauchy random variable symmetric about 0. Decide if 
|X| is in any domain of attraction and, if so, identify the attracting stable type 
and find appropriate normalizing constants for convergence. Also, decide if |X| is 
in any domain of strict attraction and, if so, identify the attracting strict stable 
type and find appropriate scaling constants for convergence. 


Problem 39. Generalize the preceding problem by continuing the assumption of 
symmetry, but removing the requirement that 0 be the center of symmetry. 


Problem 40. Decide which stable distributions belong to which domains of strict 
attraction, if any. 


346 


17. STABLE DISTRIBUTIONS AS LIMITS 


Problem 41. Show that the slow variation at oo of 


TA / \s| R(ds) 
[—y.y] 


in combination with the existence of 
in = 
y= o0 Ji [s| R(ds) 


=y,y 


as a number larger than $ is sufficient for R to be in the domain of strict attraction 
of 61 . 


Problem 42. Can Problem 38 be done using the preceding problem? If so, do so. 


If the conditions in Problem 41 were necessary for R to be in the domain of 
attraction of 6,, these conditions would make a better theorem than Theorem 15, 
since they do not involve the sequence (an). However, the following problem 
shows that they are not necessary. 


Problem 43. Consider the distribution having the following density with respect 


to Lebesgue measure: 
tyes lvl ify < —1 


y 
y~ <0 if -l<y<l 


ry OR ifl<y, 


where k is defined by the property that the integral on R of the density equals 1. 
Show that this distribution is in the domain of strict attraction of 6,, but that this 
fact cannot be deduced from Problem 41. 


As indicated by the last few problems, Theorem 15 is not entirely satisfactory. 


Unfortunately, although there are other results of this type in the literature, the 
authors of this book have not been able to find anything better for the general 
case. However, for the nonnegative case, there is a satisfactory result, which we 
state here without proof. 


Theorem 16. Let R be a distribution on R* with distribution function F. 


Then R is in the domain of strict attraction of 6, if and only if 


2 Py). 
a F(y) = 


in which case the scaling constants may be chosen as in Theorem 15. 


CHAPTER 18 


Convergence in Distribution 
on Polish Spaces 


We want to extend the concept of convergence in distribution to probability 
spaces other than (R, 8). Certain metric spaces, known as ‘Polish spaces’, play 
a central role. Particularly important examples of Polish spaces are the real 
line, the extended real line, d-dimensional Euclidean space, infinite products of 
intervals, and spaces of continuous functions. Thus, this chapter may be viewed 
as a mechanism for extending the concepts and results discussed in Chapter 14 
to a wide variety of settings. (Basic facts about metric spaces are treated briefly 
in Appendix B. Some of the topology in Appendix C is also relevant.) 

Particular attention will be given to distributions on Rf. A central limit 
theorem will be proved by using the ‘Cramér-Wold Device’ which reduces certain 
problems for RË to problems for R. 


18.1. Polish spaces 


Much of the theory developed in Chapter 14 can be adapted to a certain type of 
metric space which we now define. 


Definition 1. A Polish space is a complete metric space that has a countable 
dense subset. 


The real line R (with the usual Euclidean metric) is a Polish space, with 
the rational numbers constituting a countable dense subset. As described in 
Appendix B, the extended real line R is a complete metric space with the distance 
between zx and y defined as | arctany — arctan z|. It is a Polish space because 
the rational numbers constitute a countable dense set. 


Remark 1. Because the definition of Polish space requires completeness of 
the metric, the correct choice of metric is important. For example, if we used 
the metric p(z, y) = | arctan z — arctan y| in R, we would not have completeness, 
despite the fact that the open sets arising from this metric are the same as those 
arising from the usual metric. 


348 18. CONVERGENCE IN DISTRIBUTION ON POLISH SPACES 


Another important Polish space is R? with the usual (Euclidean) metric: the 
distance between z and y equals 


d 
> “(yj — 25). 
j=1 


The set of points having rational coordinates is a countable dense set. Three 
other metrics for R? that give the same open sets as this metric and also make 
R into a Polish space are 


d d 
X lu- zl, > (lu; -— as) A1), 
jel j=l 

and 

(18.1) 5 eala 

. 2 , 


Although (18.1) is perhaps the most complicated of the alternate metrics 
for RÊ, it has the advantage that it generalizes easily to R®, as shown by the 
following example. 


Example 1. We use R to denote the space of all sequences (21, 22,...) of 
real numbers, with the product topology. We want to metrize this topological 
space; that is, we want to make it into a metric space in such a way that the 
metric gives the same open sets as does the product topology. We define the 
distance between two sequences x and y to be 

CO 
(18.2) p(z,y) = >> UEa . 

j=1 
It is straightforward to check that this definition gives a metric for R®. We will 
check that this metric gives the same open sets as does the product topology. 

We must show (i) if O is an open set in the product topology, then for each 
x € O, there is an open set U in the topology given by the metric p such that 
x E€ U and U C O, and (ii) if U is an open set in the topology given by p, then 
for each x € U, there is an open set O in the product topology such that x € O 
and OCU. 

Every open set O in the product topology is the union of sets of the form 


(18.3) { (01735003) 2 (£1,---, £4) E€ Oa} 


for some positive integer d, where Og is an open set in R. It follows that in 
proving (i) and (ii), we may restrict our attention to sets O of the form (18.3). 
Similarly, every open set U in the topology given by the metric p is a union of 
sets that are open balls in that metric, so in (i) and (ii) we only need to consider 
open balls U. 


18.1. POLISH SPACES 349 


We will prove (ii), and leave the proof of (i) to the reader. Let U be an open 
ball in R® with the metric p, and pick a point z = (21,72,...) E€ U. By a 
standard argument involving the triangle inequality, there exists an € > 0 such 
that the open ball U’ = {y: p(z,y) < £} is contained in U. We will find a set 
of the form (18.3) that contains x and is contained in U’. Choose d such that 
27-1 > +, and let Og be the open ball centered at the point (z£1,..., £4) in R 
with radius 5, using the metric (18.1). Let O be defined in terms of Og as in 
(18.3). Clearly x € O. Moreover, it is easy to check from the definition of p that 
if y € O, then p(z,y) < £, so O C U', as desired. 

Those sequences containing only rational terms and only finitely many terms 
different from 0 constitute a countable dense set in R°°. The metric space R” is 
also complete, a consequence of the second problem below and the completeness 
of R. Therefore R” is a Polish space. 


Problem 1. Decide if the following sequence in R™ converges and if so to what: 


((1,0,0,0,0,...), (0,2,0,0,0,...), (0,0,4,0,0,...), (0,0,0,8,0,...), .--). 


Problem 2. Let zn, n = 1,2,..., and y be members of R®. Prove that £n > y as 
n — oo if and only if £j,n —> y; where £j n is the ge term of x, and y; is the j™ 
term of y. (Comment: One says that the topology that we have given R” is the 
topology of coordinate-wise convergence.) 


The method used in Example 1 to find a suitable metric for R® generalizes 
easily to countable products of arbitrary Polish spaces. 


Proposition 2. For j = 1,2,..., let (Pj, pj) be a Polish space, and let Y = 
Q V;, with the topology on Y being the product topology. For x = (£1, 22,...) 
and y = (y1,Y2,--.) in Y, define 


plti yj) Al 

plz, y) = Y SLIN 
j=1 

Then (Y, p) is a Polish space. 


In Example 1 and the preceding proposition, the setting is one in which the 
space of interest already has a natural topology attached to it. Therefore the 
problem was to construct a metric consistent with the topology so that the re- 
sulting metric space is a Polish space. The following problem presents a situation 
where there is already a natural choice for the metric. 


Problem 3. Let C[0, 1] denote the metric space of continuous R-valued functions 
on [0,1] with the distance between two functions f and g being defined as the 
quantity 


(18.4) max{| f(t) — g(t)|: ¢ € [0, 1]}. 


350 18. CONVERGENCE IN DISTRIBUTION ON POLISH SPACES 


Prove that C[0,1] is a Polish space and that the Borel o-field equals 


o({f: f(t) € B}: t € [0,1], Borel BCR). 


The following proposition provides further examples of Polish spaces. 


Proposition 3. A closed subset of a Polish space is a Polish space with the 
inherited metric. 


PROOF. Let (W,p) denote the Polish space and C the closed subset. By 
Problem 1 of Appendix B, (C, p) is a metric space. 

Consider a Cauchy sequence in C. It converges to a member of w of ©. This 
point y must be a member of C; otherwise C would not be closed. Thus C is 
complete. 

Let D be a countable dense subset of Y. For each positive integer n let Dy 
consist of those members of D whose distance from C is less than L, For each 
member yw of Dn choose a member of C, conceivably wy) itself, whose distance 
from w is less than + and let E,, denote the set of such chosen points. Every 
point in Č is within a distance of Z of some member of En. Hence, Oise En 1s 
a countable dense subset of C. O 


Problem 4. Explain why {f € C(0,1]: f(0) = 0} is a Polish space, with the dis- 
tance between f and g being specified by (18.4). 


* Problem 5. Give the set C[0, œc) of continuous R-valued functions on [0,00) a 
metric so that it becomes a Polish space with the topology of uniform convergence 
on bounded sets. 


Example 2. [Infinite-dimensional cube] Consider the space [0,1]. This is 
the set of all sequences (z1, £2,...) of real numbers belonging to the interval 
[0,1]. It is a subset of the space R” introduced in Example 1, and it is easy to 
check that it is closed, since it is a product of closed sets. By Proposition 3, it 
is itself a Polish space, with distance function p given by (18.2). 

We may also take the point of view that [0,1] is a countable product of 
Polish spaces, and hence is itself a Polish space by Proposition 2. And since 
it is also the product of compact sets, it is compact by the Tychonoff Theorem 
(Theorem 3 of Appendix C). This fact gives importance to the next result, which 
says that an arbitrary Polish space is topologically equivalent to a Borel subset 
of the infinite-dimensional cube. 


Lemma 4. For any Polish space V, there exists a function p from Y onto a 
Borel subset of [0,1] such that p is continuous and one-to-one on V and y™' 
is continuous (and one-to-one) on (Y). 


18.1. POLISH SPACES 351 


PROOF. Let Y be an arbitrary Polish space with metric g, and let (Yn: n = 
1,2,...) be a countable dense subset of Y. Consider the function y: Y — [0,1] 
defined by 

yp Eg (olp, p) A1, olp, we) Al, a) : 

We claim that y is one-to-one and that both y and y~! (defined on the image 
of vy) are continuous. 

The continuity of y is an immediate consequence of the continuity of the 
functions ~~ o(w,W,) for all n. To prove that y is one-to-one, let ~ and 7 be 
two distinct members of Y, and let £ = o(w,n)A2. By the definition of a metric, 
e > 0. Using the fact that the set (Yn: n = 1,2,...) is dense, choose k so that 
o(p, Yk) < €/2. It follows from the triangle inequality that a(n, Yk) > €/2, so 
y(n) and y(w) necessarily differ at the kt” coordinate. Therefore, y(n) 4 y(w), 
and we may conclude that ¢ is one-to-one. 

To show that y~! is continuous at an arbitrary point (z1, £2,...) in y(W), set 
yp l(21,22,...) = Y, fixe € (0, t), and then consider an arbitrary (y1, Y2,...) € 


(WV) for which 
E 
p{(x1,22,---), (Yi, Y2- --)) < 3.9K? 
where p is given by (18.2) and k is chosen so that o(Y, Yk) < €/3. Let n 
y'(y1,y2,...). Note that, by the definition of y, o(7, Yk) = ye and o(w, Yk) = 


£k. It follows from the definition of the metric p that 


JaC, Vx) — a(b, Ua) < 2ol(e122,---)sYir¥a)) < F, 
so a(n, Yk) < 26/3. Thus 
o(p, n) < o(w, Wr) + o(n, Wr) SAG 


The continuity of y~! follows. 

We will prove that y(W) is a Borel set by writing it in terms of countably many 
operations involving open and closed subsets of [0,1]°°. Let D be a countable 
dense subset of Y. For each d € D and each positive integer k, let B(d,1/k) 
be the open ball of radius 1/k centered at d. By the continuity of y~!, the set 
y( B(d,1/k)) is an open subset of y(W) in the relative topology, so there exists 
a set V(d,k) which is open in the topology of [0,1] such that 


y(B(d, 1/k)) = V (d, k) N y(¥). 
Let V (k) = Ugep V (d, k). We claim that 


(18.5) ea = 9H (T VE), 
k=l 


where y(W) is the closure in [0,1] of y(W). Clearly, (18.5) implies that y(¥) 
is Borel. 

Each member of y(W) belongs to each V(k) and thus to the set on the right 
side of (18.5). It remains to show that if v belongs to the set on the right side of 


352 18. CONVERGENCE IN DISTRIBUTION ON POLISH SPACES 


(18.5), then v belongs to y(W). For each k, the fact that v € V(k) implies the 
existence of dg € D such that v € V(d,,k). Since v is in the closure of y(W), 
every open neighborhood of v contains a member of (Y). In particular, for each 
k, we may choose vz € y(¥) such that 


vk E V(d1, 1) N--- NV (de, k) N {u € [0, 1]: p(u,v) < 4}. 


Note that vk —> v as k 4 œ. Also note that (yp l(u,): k = 1,2,...) isa 
Cauchy sequence in W, since for j,k > m, vj; and vz are both members of 
V(dm,m) and hence o(y™! (vj), pt (vk)) < 4. Since W is a Polish space, y = 
lim y7! (vg) exists. By the continuity of y, p(y) = lim, vz = v. Thus v € y(¥), 
as desired. O 


Problem 6. [Infinite-dimensional space-filling curve] It is a well-known fact, often 
discussed in topology texts, that there exists a continuous function h from (0, 1] 
onto [0,1]?. Such a function is a Peano curve, named after its discoverer. Use 
h to construct a continuous function from [0,1] onto [0,1]°°. Hint: Since h is a 
continuous (0, 1]*-valued function, it can be expressed in terms of two continuous 
[0, 1]-valued functions as h = (kı, h2). Consider the function 


(hi, hi o he, hi o hz o h2, hı o h2 o h2 0hg,...). 


18.2. Definition of and criteria for convergence 


When we view Polish spaces as measurable spaces we follow our customary con- 
vention that the o-field is the Borel o-field unless something to the contrary 
is explicitly stated. In particular, throughout this section this convention is in 
force. The following definition is motivated by Proposition 7 of Chapter 14. 


Definition 5. Let Q and Qn, n = 1,2,..., be probability measures on a 
Polish space ¥. Then (Qn: n = 1,2,...) converges to Q, denoted by Qn > Q as 
n — oo, if, for every R-valued bounded continuous function g on Y, f gdQn > 
f gdQ asn > œ. 


For random variables having values in a Polish space, we say that a sequence 
(Xn) converges to X in distribution and write 


D 


if Qan > Q, where Qn and Q denote the distributions of Xn and X, respectively. 

The following theorem generalizes Proposition 7 of Chapter 14 to the setting 
of Polish spaces. The name of the theorem refers to the fact that it contains so 
many conditions. Note the new condition (vi) and the change in condition (ii). 
We leave it to the reader to check that the set which appears in condition (vi) 
is a Borel set. As in Proposition 7 of Chapter 14 we use ðA for the boundary of 
a set A. 


18.2. DEFINITION OF AND CRITERIA FOR CONVERGENCE 353 


Theorem 6. [Portmanteau] Let Q and Qn,n = 1,2,..., be probability mea- 
sures on a Polish space Y. Then the following conditions are equivalent: 


(i) Qn 9 Q asn > ox; 

(ii) limp+oo | gdQn = f gdQ for each bounded uniformly continuous 
function g on Y; 

(iii) limsup,, oo Qn(C) < Q(C) for each closed subset C of Y; 

(iv) liminfp+o Qn(O) > Q(O) for each open subset O of Y; 

(v) limp+.oo Qn(A) = Q(A) for each Borel subset A of Y for which 
Q (OA) = 0; 

(vi) limnsoo | gdQn = f gdQ for each bounded measurable function g 
for which Q({y € WG: g is discontinuous at w}) = 0. 


PROOF. That (i) => (ii) and (vi) => (i) are both obvious. 

The proof that (ii) => (iii) is essentially the same as the proof of the cor- 
responding part of Proposition 7 of Chapter 14, since the functions introduced 
in that proof are uniformly continuous and are defined in a manner that works 
equally well in a metric space. 

The proofs that (iii) <> (iv) and {(iii), (iv)} => (v) are also the same as 
the corresponding parts of the proof of Proposition 7 of Chapter 14. 

Finally we prove that (v) => (vi). Let g be a bounded measurable function 
from W to R, and let 


D = {4 € Y: g is discontinuous at Y}. 


Assume that Q(D) = 0. For each n, g may be regarded as a bounded random 
variable from (¥, A, Qn) to (R, 8), where A and B are the respective Borel ø- 
fields. Let R, be the distribution of this random variable. Similarly, g is a 
bounded random variable from (¥, A, Q) to (R, B) whose distribution will be 
denoted by R. We will first show that R, > R as n > oo. 

Let B be a Borel subset of R for which R(OB) = 0. Then Q(g!(OB)) = 0. 
It is easily checked that if y € Og~1(B), then either g is discontinuous at 7, 
or g(w) € OB. Hence 0g-'(B) C DUg™}(OB), so that Q(g! (B)) = 0. 
Thus, Qn(g~1(B)) > Q(g71(B)) as n > œ, or equivalently, R,(B) > R(B) as 
n — oo. By (v) of Proposition 7 of Chapter 14, Rn + R as n > œ. 

By (ii) of Proposition 7 of Chapter 14, fhdR, > f hdR as n - oo for each 
bounded continuous R-valued function h on R. We apply this fact with 


xg if |zl<c 
h(z)=4-c ifz< -c 


C ifr>c 


where c = sup{|g(#)|: Y% € Y}. Doing so completes the proof since f g dQn = 
fhdR, and fgdQ=fhdR. O 


354 18. CONVERGENCE IN DISTRIBUTION ON POLISH SPACES 


Corollary 7. Let Q and R be two distributions on a Polish space. If 


(18.6) f saR= foa 


for each bounded uniformly continuous function g on V then R = Q. 


PROOF. Suppose that (18.6) holds. By letting R = Qn for all n in the Port- 
manteau Theorem one gets R(O) > Q(O) for every open set O. Interchanging 
the roles of Q and R gives Q(O) > R(O). Since R(O) = Q(O) for every open 
set O, it follows by the Uniqueness Theorem that R= Q. O 


Corollary 8. Let Q and R be two distributions on a Polish space. If both R 
and Q are limits of the same sequence of distributions, then R= Q. 


ProoF. If R and Q are both limits of the same sequence, then by the Port- 
manteau Theorem, (18.6) holds. O 


It is not immediately apparent from the definition of convergence for a se- 
quence of distributions on a Polish space that a given sequence can converge to 
only one distribution, but the preceding corollary shows that to be the case. (Of 
course, a sequence that does not converge might have various subsequences that 
converge to different limits.) 

In Chapter 14 we found a connection between convergence in distribution on 
R and convergence in probability of sequences of R-valued random variables. 
This connection will carry over to Polish spaces, once we have appropriately 
generalized the definition of convergence in probability. 

Let (Xn: n = 1,2,...) be a sequence of random variables with values in a 
Polish space (¥, p). The sequence (Xn) converge in probability to a V-valued 
random variable X if, for every £ > 0, 

lim P[p(X, Xn) >€] =0. 


noo 


It is Cauchy in probability if, for every £ > 0, there is an integer l such that 
P| p(Xn, Xm) >€] <€ 


whenever m,n > l. 

Since a Polish space is a complete metric space, a sequence of random variables 
with values in a Polish space converges almost surely if and only if it is almost 
surely Cauchy. The statements and proofs of Theorem 2 and Lemma 24, both of 
Chapter 12, carry over to the present setting along with Lemma 3 and Problem 39 
of that same chapter. Thus, a sequence is Cauchy in probability if and only if it 
converges in probability. Moreover, almost sure convergence implies convergence 
in probability. Also, convergence in probability implies almost sure convergence 
for an appropriate subsequence. 


18.3. RELATIVE SEQUENTIAL COMPACTNESS 355 


Proposition 9. Let X and Xn, n = 1,2,..., be random variables on a com- 
mon probability space having values in a common Polish space. Suppose that 
the sequence (Xn: n = 1,2,...) converges to X in probability as n —> oo. Then 


D 
Xn — X as n> oo. 


Problem 7. Prove the preceding proposition. Hint: Let g be a uniformly continuous 
bounded R-valued function defined on the Polish space, and show that the sequence 
(goXn:n=1,2,...) of R-valued random variables converges in probability to the 
R-valued random variable go X. 


The next result says that if a sequence of probability measures on a Polish 
space converges in distribution, then so does any sequence induced from it by a 
continuous function into another Polish space. 


Proposition 10. Let Q and Qn,n = 1,2,..., be probability measures on a 
common Polish space Y. Let h be a continuous function from Y to a Polish space 
T, and let R and Rn be the measures induced by h from Q and Qn, respectively. 
If Qn > Q asn > œ, then Rn > R as n > œ. 


* Problem 8. Prove the preceding proposition. 


Problem 9. Let X and Xn, n = 1,2,..., be C[0, 1]-valued random variables and 
suppose that Xn 2, X as n > œ. Prove that 


max{Xn(t):0<t <1} —> max{X(t):0<t< 1} 


as n —> œ. 


18.3. Relative sequential compactness 


In this section we prove a basis fact about compactness in Polish spaces, intro- 
duce the concept of relative sequential compactness for families of probability 
distributions on a Polish space, and finally prove that any such family is relatively 
sequentially compact if the Polish space itself is compact. 

As described in Appendix B, a set is totally bounded if for every £ > 0, it is 
contained in the union of a finite collection of balls of radius less than €. 


Proposition 11. A Polish space is compact if and only if it is totally bounded. 


Problem 10. Prove the preceding proposition. Hint: Use Proposition 2 and Propo- 
sition 3, both of Appendix B. 


Problem 11. Either by using the preceding proposition or by a direct argument, 
show that {z: |x(t)| < 1 for all t € [0,1]} is not a compact subset of the Polish 
space C[0, 1] described in Problem 3. 


356 18. CONVERGENCE IN DISTRIBUTION ON POLISH SPACES 


Problem 12. Without using Proposition 11, give a direct proof of the total bound- 
edness of the infinite-dimensional cube defined in Example 2. 


Definition 12. A family Q of probability distributions on a Polish space © is 
relatively sequentially compact if every sequence (Qn: n = 1,2,...) of members 
of Q has a convergent subsequence. 


The following lemma constitutes a major step in the identification of the 
relatively sequentially compact families of distributions on a Polish space. 


Lemma 13. Every family of probability distributions on the infinite-dimen- 
sional cube [0,1] is relatively sequentially compact. 


PROOF. Let (Qn: n = 1,2,...) be a sequence of probability measures on 
[0,1]*°. By Problem 6 there exists a continuous function g from [0,1] onto 
[0,1]°°. Define f: [0,1] — [0,1] by 


f(x) = inf{t € [0,1]: g(t) =z}, awe [0,1]%. 


It follows from the continuity of g that go f is the identity function. The 
measurability of f follows from the fact that {x: f(x) < a} is compact, being 
the image under g of the compact set [0,a] (see Problem 7 of Appendix B). 
For n = 1,2,..., let Rn be the sequence of probability measures induced on 
[0,1] by f from Qn. By Theorem 13 of Chapter 14, there exists a convergent 
subsequence (Rp, : k = 1,2,...) with limit equal to some probability measure R 
on [0,1]. Let Q be the measure induced on [0,1] by g. Note that for each n, 
Qn is the measure induced by g from Rn. By Proposition 10, the subsequence 
(Qn,:k =1,2,...) converges to Q. O 


The following result is worth remembering, even though it is a special case of 
the forthcoming Theorem 17. 


Proposition 14. Every family of probability distributions on a compact Pol- 
ish space is relatively sequentially compact. 


Problem 13. Use Problem 7 of Appendix B and Lemma 4 and Lemma 13 of this 
chapter to prove the preceding proposition. 


We conclude this section with a simple result that is quite useful for proving 
convergence in distribution in Polish spaces. 


Proposition 15. Let (Qn: n = 1,2,...) be a relatively sequentially compact 
sequence of probability measures on a Polish space such that every convergent 
subsequence has the same limiting probability measure Q. Then Qn > Q as 
n > co. 


18.4. UNIFORM TIGHTNESS AND THE PROHOROV THEOREM 357 


Problem 14. Prove the preceding proposition. Hint: See the solution of Proposi- 
tion 4 of Appendix B. 


18.4. Uniform tightness and the Prohorov Theorem 


In this section we identify necessary and sufficient conditions for a family of 
probability measures on a Polish space to be relatively sequentially compact. 


Definition 16. A probability distribution on a Polish space W is tight if, for 
every € > 0, there exists a compact subset K of Y such that Q(K‘°) < €. A 
family Q of probability distributions on Y is uniformly tight if, for every £ > 0, 
there exists a compact subset K of Y such that Q(K°) < e€ for every Q € Q. 


Some use the term ‘tight’ for a family to mean ‘uniformly tight’, but we will 
not use the abbreviated term. 


Theorem 17. [Prohorov] A family of probability measures on a Polish space 
W is relatively sequentially compact if and only if it is uniformly tight. 


PROOF. Let Q be a uniformly tight family of probability measures on WV, and 
let (Qn: n = 1,2,...) be a sequence of members of Q. By definition, there exists, 
for each € > 0, a compact set Ke C WV with the property that Q,(K-) > 1—e for 
all n. We use the function y defined in Lemma 4 to transfer everything to the 
Polish space [0, 1]°°. Let C. = y(K-). By Problem 7 of Appendix B, each C; is a 
compact subset of [0,1]°°. Let (Rn: n = 1,2,...) be the sequence of probability 
measures on [0,1]°° induced by ọ from the sequence (Qn: n = 1,2,...). Note 
that R,(Cz) > 1-— e€ for all n and e. 

By Lemma 13 there is a subsequence (Ry, : k = 1,2,...) that converges to a 
probability measure R on [0,1]°°. By the Portmanteau Theorem, R(C,) >1—e 
for all e. Since Ce C (Y) for all e, it follows that R(w(V¥)) = 1. Let Q be the 
measure induced by y~! on W from R. It follows from the continuity of y7! and 
Proposition 10 that Qn, ~> Q as k > ov. 

To prove the converse, suppose that Q is a relatively sequentially compact 
family of probability measures on Y. Let (Yn: n = 1,2,...) be a countable 
dense subset of Y, and for each ô > 0, let B(Yn, ô) be the open ball of radius 6 
about the point Yn. Let B(wn,d) be the closure of B(wn, ô). 

We now show that for each ô, there exists an integer p(d) such that for all 


Q€ Q, 
p(d) 
a(U B(Wbns8)) Stes. 
n=1 


Suppose that such an integer p(d) does not exist for some particular choice of 
6 > 0. Then for each positive integer m, there exists a probability measure 


358 18. CONVERGENCE IN DISTRIBUTION ON POLISH SPACES 


Qm E Q such that 
m( U B(Wn8) Ki: 


n=1 
By relative sequential compactness, the sequence (Qm: m = 1,2,...) has a 
convergent subsequence with limit equal to some probability measure Q. By the 
Portmanteau Theorem, 


Q (U B(wn, 5) <1-6 


for all m. Since the collection of balls (B(Yn, 6): n = 1,2,...) covers the space Y, 
it follows, by Continuity of Measure, that Q(¥) < 1— ô. Since Q is a probability 
measure, we have derived a contradiction, so the integer p(é) must exist for all 
ô > 0 as asserted. 
For each ô > 0, let 
p(d) 


n=l 


Each set Cs is the union of a finite number of closed sets, and hence is closed 
(but not necessarily compact!). Fix £ > 0 and let 


K = ( ) Cejar .- 
n=1 
Since K is an intersection of closed sets, it is closed. By construction, K is 
totally bounded, so K is compact by Proposition 11. Since Q(Cs5) > 1 — ô for 
all 6 > 0 and all Q € Q, an elementary calculation shows that Q(K) > 1— e for 
alQ@eQ. O 


Corollary 18. Every probability measure on a Polish space is tight. 


* Problem 15. Prove that if Q is a probability measure on a Polish space Y, then 
for any open set A C W, 


Q(A) = sup{Q(K): K is compact and K C A}. 


18.5. Convergence in product spaces 


We begin with some terminology. For j = 1,2,..., let (Wj, pj) be Polish spaces, 
and let (¥,p) be the corresponding product Polish space (see Proposition 2). 
Let A = {j1,jo,...} be any (finite or countably infinite) set of positive integers. 
For simplicity, assume that Jı < jo <.... Let ha: Y > Y; x Uj, x... be the 
function 


Sects (0 A 


18.5. CONVERGENCE IN PRODUCT SPACES 359 


known as the projection of Y onto the coordinates indered by A. It is easily 
checked that any such projection is continuous. 

If Q is a distribution on the Polish space Y, then the measure induced from 
Q by ha is called the marginal of Q corresponding to A. If A is finite, then 
the measure induced from Q by hy is known as a finite-dimensional marginal. 
If A has cardinality n, then the corresponding marginal is sometimes called 
an n-dimensional marginal. If A = {j}, then the corresponding 1-dimensional 
marginal is called the j coordinate marginal. 

It is easy to use the Uniqueness Theorem to show that a probability measure 
on a countable product of Polish spaces is determined by its finite-dimensional 
marginals. The next theorem says that convergence in distribution on a count- 
able product of Polish spaces is equivalent to convergence in distribution of each 
of the finite-dimensional marginal distributions. 


Theorem 19. Let Qn, n = 1,2,..., and Q be distributions on the Polish 
space V = Qe WU; that was defined in Proposition 2. If Qn > Q as n > ow, 
then for each set A C {1,2,...}, Q4 > Q^ as n > œ, where Q4 and QA are 
the measures induced from Qn and Q by the projection ha. On the other hand, 
if for all finite sets A, there exists a measure Q4 such that Qa 5 Q4 asn > oœ, 
then there exists a measure Q such that Qn > Q as n > œ, and Q4 = Q4. 


PrRooF. The first part of the theorem follows immediately from Proposi- 
tion 10 and the fact that each of the functions ha is continuous. 

For the proof of the second part, assume that for each finite A, there exists 
a measure Q^ such that Q4 + Q^ as n > œ. We use the convergence of the 
1-dimensional marginals to prove that the sequence (Qn) is uniformly tight. For 
each j = 1,2,..., let Qi, be the jt” coordinate marginal of Qn. Since QÍ > QU} 
as n — oo, the sequence (Q/,) is uniformly tight for each fixed j. Fix € > 0, and 
for each j, choose a compact set K; C W; such that QI (K;) > 1—/2? for all 
n. Let K = Kı x K2 x .... By the Tychonoff Theorem (see Appendix C), K is 
a compact subset of Y. For each n, 


On(K) = Qn((){(a1,22,-..): 27 € Kj}) > 1- F QI(K) > 1-e. 


Thus the sequence (Qn) is uniformly tight. 

Since (Qn) is uniformly tight, there exists a measure R and a subsequence 
(Qn,) Such that Qn, > Ras k > oo. By the first part of the theorem, the 
finite-dimensional marginals of the terms in the sequence (Qn, ) converge to the 
finite-dimensional marginals of R. Thus, the finite-dimensional marginals of 
R are the measures Q4. Any other subsequential limit must have these same 
marginals, and thus be equal to R. An application of Proposition 15 completes 
the proof. O 


360 18. CONVERGENCE IN DISTRIBUTION ON POLISH SPACES 


Problem 16. Let Y = Y; x U2 x... be a countable product of Polish spaces, as in 
Theorem 19, and let (Qn) be a sequence of probability measures on Y. Show that 
(Qn) is uniformly tight if and only if (Q2,) is uniformly tight for each j, where Qi, 
is the jt? coordinate marginal of Qn. 


Problem 17. Let WY be as in the preceding problem, and for each n, let Xn bea 
W,,-valued random variable. Show that as n > oo, 


E Cremer, Cre cree ET ka coe Cue. Can 


Problem 18. Describe how to use the previous result to quickly obtain Theorem 16 
of Chapter 9 as a corollary of Theorem 7 of that same chapter in the Polish space 
setting. 


In general, the hypothesis of the second part of Theorem 19 cannot be weak- 
ened in any significant way. For example, it would not be enough for all the 
n-dimensional marginals to converge for some fixed n. But there is one im- 
portant special case in which the convergence of the 1-dimensional marginals is 
sufficient. 


Theorem 20. Let J be a countable set, and for j € J, let Y; be a Polish 


space. Set 
v=) Y, 
jEJ 
viewed as a Polish space via Proposition 2. For n = 1,2,..., let 
JET 


be a product measure on Y, where for each j and n, Qi, is a probability dis- 
tribution on the Borel subsets of Yj. Then the sequence (Qn) converges to a 
distribution Q as n + œ if and only if for each j, the sequence (QÍ) converges 
to a distribution Q? on Vj, in which case 


GSF: 
jEJ 

PARTIAL PROOF. For the ‘only if’ aspect, note that for each 7 and n, the 
probability measure Q/, is the j*® coordinate marginal of Qn. Thus, if Qn > Q 
as n —> œ, it follows from Theorem 19 that for each j, the sequence (Q2) 
converges as n — oo to the jt? coordinate marginal of Q, as desired. 

For the proof of the ‘if’ portion we focus on the case J = {1,2} and leave 
the rest to the reader. Let g be an R-valued bounded continuous function on 
W, x Wo. By the Fubini Theorem, it is enough to show that 


n> CO 


48.7) im f tag =f hag? 
Wo Wo 


18.6. THE CONTINUITY THEOREM FOR R? 361 


where 
ho) = f g(x,y) Qn(dz) and hy) = | g(x,y) Q! (dy). 


Let € > 0 and note that 


J hn age -= f nag?| <|f hag- f h aQ? 


(18.8) +j [hn — h| dQ, 
W2\C 


+ | Ihn — h| dQ? 
C 


for any compact C C Wy. There exists / such that the first term on the right is 
less than £/3 for n > l, since h inherits continuity and boundedness from g and 
since Q? + Q? as n — oo. The second term on the right is less than ¢/3 for all 
n and some C, since all |hn| and |h| inherit a common bound from g and since 
the sequence (Q?) is uniformly tight. 

For the last term on the right of (18.8), cover C by a finite number of sets 
B,,..., By having the property that |g(x1, u) — g(xz1,v)| < ¢/9 whenever u and 
v are in a common B;. Fix u; € B;. There exist l; such that for n > l;, the 
following calculation is valid for all v; € B;: 


[An (vs) —h(vs)| < [An (vi) — An (us)| + [Pn (ui) — h(ua)| + 1A(ui) — Awa) < 3G) = §- 


Therefore |h,(v) — h(v)| < €/3 for v € C and n > max{l;: 1 < i < k}, and thus 
the left side of (18.8) is less than € for n >1Vmax{l;: 1<i<k}. D 


Problem 19. Complete the proof of the preceding theorem by first treating the 
case of #J < co by mathematical induction and then treating the case of J being 
countably infinite. 


18.6. The Continuity Theorem for R? 


The following result generalizes Theorem 15 of Chapter 14. 


Theorem 21. [Continuity (for Characteristic Functions in R?)] A sequence 
of probability distributions on R converges to a probability distribution Q if and 
only if the sequence of corresponding characteristic functions converges pointwise 
to a function y which is continuous at 0, in which case the convergence to y is 
uniform on each compact subset of RÉ, and y is the characteristic function of Q. 


PARTIAL PROOF. We leave it as a problem to prove that if (Qn: n = 1,2,...) 
is a sequence of probability measures on RÊ that converges to a probability 
measure Q, then (Bn: n = 1,2,...) converges to 8 uniformly on each compact 
set, where n and @ are the characteristic functions of Qn and Q, respectively. 


362 18. CONVERGENCE IN DISTRIBUTION ON POLISH SPACES 


For the converse, assume that the sequence (Bn: n = 1,2,...) converges point- 
wise to a function y that is continuous at 0 € R°. It follows from Problem 58 
of Chapter 13 that, for each j, the characteristic functions of the jt" coordinate 
marginals converge to a function that is continuous at 0 € R. By the Conti- 
nuity Theorem for R, the 1-dimensional marginals converge. By Problem 16, 
the sequence (Qn) is uniformly tight. By the first part of this theorem, every 
convergent subsequence of (Qn) has a limit with characteristic function y. By 
Theorem 16 of Chapter 13, these convergent subsequences all have the same 
limit. An application of Proposition 15 completes the proof. O 


Problem 20. Complete the proof of the preceding theorem by doing the first por- 
tion of the proof. 


Problem 21. Find a sequence ((Xn, Yn): n = 1,2,...) which does not converge 
in distribution (as a sequence of R*-valued random variables), but for which both 
of the sequences (Xn: n = 1,2,...) and (Yn: n = 1,2,...) converge in distribu- 
tion. For your example, find a w € R? for which E(exp(i(w, (Xn, Yn)))) does not 
converge. 


Problem 22. Let (Xn: n = 1,2,...) be an iid sequence of R-valued random vari- 
ables. Show that as n —> ov, 


(iia TE e 


The preceding problem shows that the lack of independence is not necessar- 
ily preserved when passing to a limit. Theorem 20 says that independence is 
preserved. 

In the proof of Theorem 21, we found it useful to analyze a sequence of 
distributions on R? in terms of related distributions on R. The following theorem 
extends this idea. 


Theorem 22. [Cramér-Wold Device] Let d < œ. A sequence (Xn: n = 
1,2,...) of R -valued random variables converges in distribution to a random 
variable X if and only if each sequence ((w, Xn): n = 1,2,...), w E€ RÊ, con- 
verges in distribution in which case 


(18.9) (w, Xn) => (w, X) 
for each w € RÊ. 


PROOF. The convergence in (18.9) follows immediately from Proposition 10. 

For the converse, suppose that ((w, X,)) converges in distribution for each w. 
It is clear that if (Xn) converges in distribution or has a convergent subsequence, 
the limit must have that unique distribution whose characteristic function is 


w~ lim E(eh™*e?) | 
NOOO 


18.7. THE PROHOROV METRIC 363 


In view of the Prohorov Theorem, we only need show uniform tightness of 
{Qn: n = 1,2,...}, where Qn is the distribution of Xn, because then an ap- 
peal to Proposition 15 completes the proof. By setting w = (0,...,0,1,0,...,0), 
with 1 in the j* position, we see that the sequence (Xjn:n = 1,2,...) of 
jt coordinates converges in distribution. Thus, for each j, the sequence of j*® 
coordinate marginals of (Qn) is uniformly tight, so (Qn) is uniformly tight by 
Problem 16. O 


Problem 23. Show that the hypothesis of convergence of ((w, X,)) for all w in the 
preceding theorem cannot be replaced by the hypothesis of convergence for all w 
in some basis of R?. 


Theorem 23. [Multi-dimensional Central Limit] Let d be a positive integer. 
Let (X1, X2,...) be an tid sequence of R! -valued random variables having finite 
mean vector u = E(X,) and finite covariance matriz 


Y= E(X -A[X — yw), 


where |X, — u] denotes the row matriz corresponding to the vector X, — u and 
[-]7 denotes transpose. Then 


n Pig 
n 


where Z is a normally distributed R? -valued random variable with mean vector 
0 and covariance matrix ©. 


* Problem 24. Use the Classical Central Limit Theorem and the Cramér-Wold De- 
vice to prove the preceding theorem. 


Problem 25. Apply the Multi-dimensional Central Limit Theorem to the iid se- 
quence (Z1, Z2,...) where each Z; has the same distribution as the random vector 
Z of Problem 60 of Chapter 13. 


18.7. {| The Prohorov metric 


Let Q(Ẹ) denote the family of all probability distributions on a Polish space 
(Y, p). Our final goal of this chapter is to turn Q(W) into a Polish space with 
a metric p so that convergence of sequences in the Polish space (Q(W), J) is 
equivalent to convergence of sequences of distributions on Y. The metric fp we 
will use is called the Prohorov metric and is defined by 


(18.10) p(Q,R) = inf{e: R(A) < Q(A.) + € for every Borel A C YẸ}, 


where 
Ae = {z: p(z,y) < for some y € A}. 


364 18. CONVERGENCE IN DISTRIBUTION ON POLISH SPACES 


(Note: It is important that A, be Borel; indeed, it is open, since it is the union 
of open balls.) 


* Problem 26. Choose 6,¢ > 0. Prove that if R(A) < Q(A-) + ô for all Borel sets 
ACY, then Q(A) < R(A-) + ô for all Borel sets A C W. 


Problem 27. Prove that ô defined by (18.10) is a metric. 


From the preceding problem we see that (Q(W), 6) is a metric space. Let C 
be a countable dense subset of Y. Then it is easy to show that the set C of 
probability distributions whose values are rational and whose supports are finite 
subsets of C is a countable dense subset of Q(¥). We have almost proved the 
following theorem. 


Theorem 24. Let (WV, p) be a Polish space and let Q(V) denote the family 
of all probability measures on (WV, p). Define p by (18.10). Then (Q(W), p) is a 
Polish space. 


PARTIAL PROOF. In view of the discussion preceding the theorem, we need 
only show that (Q(W), 6) is complete. Let (Qn: n = 1,2,...) be an arbitrary 
Cauchy sequence. To prove the convergence of a Cauchy sequence, it is enough to 
find a subsequence that converges. By the Prohorov Theorem, it is enough to find 
a subsequence that is uniformly tight. Using the definition of Cauchy sequence, a 
routine argument shows that there exists a subsequence (Rp: n = 1,2,...) with 
the property that the (Rn, Rn41) < 1/2"! for n = 1,2,.... We will prove that 
this subsequence is uniformly tight. 

Fix £ > 0. We will define a sequence of compact subsets of Y. Choose a 
positive integer l such that 1/2! < €/2. By Corollary 18 we can find a compact 
set K such that R,(K) > 1 —€/2 for n = 1,...,l. Let K; = K for j =1,...,l. 


We now proceed recursively to define compact sets Kı+1, Ki42,..., such that for 
all n >l, 
1 
Ry(Kn) >1- (1 = Te 
and 
(18.11) Kni C (Kn)o-in+13) , 
where 
1 
(Kn)ijzn+1 = {2 E Y: p(z,y) < S for some y € Kn}. 


We have already defined K; with the desired properties. Assume that sets 
Kı,..., Kn have been defined with the desired properties for some n > L. By the 
definition of p and our choice of the integer J, 


1 1 
Rn+i((Kn)1/2.41) > Rn(Kn) = nti >1- (1 = papa )E- 


18.7. THE PROHOROV METRIC 365 


It follows from Problem 15 that there exists a compact set K,,+1 satisfying (18.11) 
such that R(Kn4i1) > 1 — (1 —27("+?-))e. The recursive construction of the 
sequence (Kn: n = 1,2,...) of compact sets is complete. 

Let 


(e e) 
K = the closure of U Ky. 
n=1 
Clearly R,(K) > 1—e for all n. The proof of uniform tightness is completed by 
showing that K is compact. By Proposition 11, it is enough to prove that K is 
totally bounded. This final step is left as a exercise. O 


Problem 28. Prove that the set K defined in the preceding proof is totally bounded. 
Hint: Show that for each € > 0, a sufficiently large value of n can be found so that 
any covering of Kı U--- U Kn by e-balls can be ‘inflated’ to a covering of K by 
doubling the diameters of each of the balls. 


Problem 29. For the metric space of distributions on R, calculate the distance 
between the uniform distribution on (0, 1] and the uniform distribution on [a,a+1]. 
Also calculate the distance between the uniform distributions on [0, 8] and [0, 9]. 


Problem 30. For the metric space of distributions on R’, calculate the distance 
between the delta distribution at (0,0) and the uniform distribution on the square 
region with vertices at (+a, +a). 


Problem 31. For the metric space of distributions on R™, calculate the distance 
between the delta distribution at (0,0,0,...) and the distribution Qa induced by 
the uniform distribution on [—1,1] in R and the function pa: R => R” defined by 


Galt = (0.4450 05.05 Oe. win) 


where z on the right side is the d‘® term. 


Theorem 25. Let Q(W) be the Polish space described in Theorem 24. Then 
P(Q, Qn) > 0 as n > œ if and only if Qn > Q as n > œ. 


PROOF. Suppose that P(Q, Qn) —> 0 and let C be a closed set. For each £ > 0 
there exists an integer / such that 


Qn(C) < Q(C.) +e 
for n > l. Hence, 


lim sup Qn (C) < Q(C.) +e. 


Now let £ N 0 through a sequence and use the equation 


C=()C., 


e>0 


366 18. CONVERGENCE IN DISTRIBUTION ON POLISH SPACES 


which is a consequence of the fact that C is closed, to conclude 
lim sup Qn(C) < Q(C). 
NCO 
An appeal to the Portmanteau Theorem completes this half of the proof. 
For the converse assume that f gdQ, — f gdQ for all continuous bounded 
functions g. By the Portmanteau Theorem, 
(18.12) lim Qn(A) = Q(A) 


N—> OO 


for every Borel set A for which Q(0A) = 0. 
Let € > 0 and let (2;: j =1,2,...) be a dense sequence in W. For each j, let 


B; be a ball centered at x; with radius strictly between Ẹ and 5 and having the 


additional property that Q(0B,) = 0. For each j, set 
gai 
C; = B; \ |) Bi. 
i=l 


It is clear that: 
eC; NOC = pii Zj; 
oe Uj= Cj; 
e Q(0C;) = 0 for each j; 
e the distance between any two members of any one C; is less than €. 
Choose k so that DF Q(C;) > 1— § and then use (18.12) to deduce the 
existence of l such that 


Q(Ci) < On(Cj)+ 4, forj<k,n>l. 


Let A be any Borel set and denote by A; the set of points that each lie no more 
than distance £ from some point in A. Let (Cj,,Cj,,..., Cjn) be the subsequence 
of (C;) consisting of those C; for which j < k and C; N A #9. Then, for n > l, 


QA) < 5+ QCh) < e+ Only.) SE + Qn(Ae). 
i=1 i=) 


Therefore the distance between Qn and Q is less thane forn > l. O 


CHAPTER 19 


The Invariance Principle 
and Brownian Motion 


In this chapter, we bring together several of the key ideas of previous chapters to 
construct one of the most important objects in all of probability theory, namely 
‘Brownian motion’. In order to apply the theory developed in Chapter 18, we will 
first take the point of view that Brownian motion is a random variable that takes 
values in the Polish space C[0,1] and later switch to the Polish space C[0, oo). 
We will find that when a random walk on R with finite variance is converted 
to a C[0, 1|-valued random variable in a natural way, then it can be centered 
and scaled so that its distribution approximates that of Brownian motion. As a 
consequence, much can be learned about the asymptotic properties of random 
walks by looking at the properties of Brownian motion. We rely frequently 
on the material concerning Polish spaces in Chapter 18. The Classical Central 
Limit Theorem (Chapter 15) and the Arzela-Ascoli Theorem (Theorem 5 of 
Appendix B) also play important roles. 

We include only a small fraction of the many interesting features of Brownian 
motion on [0,0o) that have been discovered over the years, but the ones we do 
include indicate the variety and subtlety of such results. Some of these results 
involve stopping times, so this chapter also contains the basic theory needed 
to extend ideas from Chapter 11 about filtrations and stopping times to the 
continuous time setting. 

Random variables X that are C[0, 1]-valued play a central role in this chapter. 
Thus, X (w) is a continuous function from [0,1] to R. Its value at t € [0,1] will 
be denoted by X;(w). Hence X; is an R-valued random variable. An alternative 
notation for X is 


w ~ |t ~ X1(w)]. 


When speaking of events involving either the C[0, 1]-valued random variable X 
or an R-valued random variable X; we will sometimes suppress w. The nota- 
tion (X™ : n =1,2,...) is a typical description of a sequence of C(0, 1]-valued 
random variables. 


368 19. THE INVARIANCE PRINCIPLE AND BROWNIAN MOTION 


19.1. Certain sequences of distributions on C[0, 1] 


Let (Y1, Yo,...) be an iid sequence of R-valued random variables having finite 


mean yp and positive finite variance o*. For n = 1,2,..., let C[0,1]-valued 
random variables X‘”) be defined as follows. First, for m = 0,...,n, set 
m 
(19.1) SE (w) = 2ra (Yew) =y) 
m/n oyn 


and then extend X™ (w) to be continuous on [0, 1] by making it linear on each of 
the intervals [2 ™],m = 1,...,n. (See Example 2 and Figures 2.1, 2.2, and 


2.3, all of Chapter 2.) The following problems and lemma constitute a first step 
towards the goal of proving that (X‘"): n = 1,2,...) converges in distribution. 


Problem 1. Suppose that R¢-valued random variables Zn and Un, n = 1,2,..., 
have the property that as n > oo, Zn -?, Z for some Z and Un 2,0. Prove that 
FW asnar 


Problem 2. Suppose that a sequence (Zn: n = 1,2,...) of R-valued random vari- 
ables converges in distribution to an R-valued random variable Z and that a se- 
quence (cn: n = 1,2,...) of real numbers converges to 1. Prove that Cn Zn 257 
as n —> OO. 


Lemma 1. Let 0 < to < tı < te < --- < ta < 1 and let X™) be defined by 
the sentence containing (19.1). Then the sequence 


(P — x), eA (Xe a jns ae 


ta-1 


of random vectors converges in distribution to a random vector having inde- 
pendent normally distributed coordinates with the mean and variance of the j* 
coordinate equaling O and (t; — t;-1), respectively. 


PROOF. For 0< j < d and n = 1,2,..., set 
t- 
Tjin = Iris and Ujn = XP- XW. 


For 0 < j < d and n = 1,2,..., set 


and Vin =X) — xt”. 


Then for n sufficiently large, 


to < S0,n < Tin < ty < S1jn < T2.n < aad < td-1 < Sd—1,n < Td,n < td; 


19.1. CERTAIN SEQUENCES OF DISTRIBUTIONS ON C(O, 1] 369 


and 
(Coy a peA Ee bau aS x) 
(19.2) SX iii OO ) 
F (Uin, iiag Uan) T (Von, Maio Va-1,n) : 


The mean of each coordinate in the last two terms is 0 and the variances 
approach 0 as n > oo. By the Chebyshev Inequality it follows that each of the 
last two terms on the right side of (19.2) converges to (0,...,0) in distribution. 
Therefore in view of Problem 1 we only need prove that the sequence 


Sd—i,n 


(19.3) (BOO HX Vie AKO EXO), lene 1) 


converges in distribution to the limit described in the lemma. 

The coordinates in each term of the sequence (19.3) are independent, and by 
Theorem 20 of Chapter 18, independence is preserved on passing to the limit. 
Thus, we only need show for 1 < j < d, that the sequence 

(x, -x 1220) 


Tj,n Sj—1l,n 


converges in distribution to a normally distributed random variable with mean 0 
and variance (t; — t;-1). By the Classical Central Limit Theorem the sequence 


(Liu (x), -xY 
VTijn — $j-1,n 


does have this property. Clearly, 


lina vin — j-n _ 1 
n> oo j — tj—1 


An appeal to Problem 2 completes the proof. O 


A C[0, 1|-valued random variable W has stationary increments if for 0 < s < 
t < 1, the distribution of (W; — W,) depends only on t — s. It has independent 
increments if for d any positive integer and 0 < tp < tı < --- < tg < 1, the 
random variables (W;, — W:;_,), 1 < i < d, are independent. The term ‘inde- 
pendent increments’ is perhaps not ideal, since it does not reflect the fact that 
independence is only required for time intervals having nonoverlapping interiors. 


Lemma 2. Let X”) be defined by the sentence containing (19.1). Then any 
subsequential distributional limit W of the sequence (X™ : n = 1,2,...) has the 
following three properties: 


(i) forO<t<1, W; is normally distributed with mean 0 and variance t; 
(it) W has stationary increments; 
(iii) W has independent increments. 
Moreover, if W“) has the properties (i)-(iii) in common with W, then W and 
W) have the same distribution. 


370 19. THE INVARIANCE PRINCIPLE AND BROWNIAN MOTION 


PROOF. Suppose that X) -> W as k > oo. Since the function 


a ~~ ((a(t1) — x(to)], [a(te) — e(t1)], ... , [2(¢e) — z(ta-1)]) 


from C[0, 1] to R? is continuous, it follows from Proposition 10 of Chapter 18 
that 


O R x a) 
D 
=a (Wi, pi Wio), e.. (Wi, = Wea-1)) : 


By Lemma 1, W satisfies (i)-(iii). 

The distributions of any C[0, 1]-valued random variables W and W® satis- 
fying (i)-(iii) are supported by {x € C[0, 1]: z(0) = 0} because of (i), and thus, 
because of (i)-(iii), agree on £, the class of sets of the form 


{x € C[0, 1]: z(0) € A, ([x(t1) — x(to)], ... , [z(ta) — e(ta-1)]) € B}, 


where d is a positive integer, 0 = tọ < tı <--- < tg <1, A is a Borel subset of 
R, and B is a Borel subset of R°. Clearly € is closed under finite intersections. 

It remains to show that o(€) = C, the Borel o-field of subsets of C[0, 1]. Let 
us describe € in a different manner. Since the function 


(Uo, U1, U2,-..,Ud) ~ (Uo, (u1 — Uo, U2 — U1, -.., Ud — Ua_1)) 


is continuous and one-to-one from R@+! onto R+! with a continuous inverse, E 
may be described as the class of sets of the form 


(19.4) {z € C0, 1]: (x(0), 2(t1), ..., z(ta)) E C}, 


where d is a positive integer, 0 < tı <---< tg < 1, and C is a Borel subset of 
R?+1. To show that o(€) = A, we will show that any set 


(19.5) {xz € C[0,1]: p(y,z) <e}, ye Cl0O,l],e>0, 


can be written in terms of countable unions and intersections of sets of the form 
(19.4). Since A is the smallest o-field containing sets of the form (19.5), it will 
follow that o(€) = A. Therefore, the following set-theoretic equality completes 
the proof of uniqueness: 

oo =62™ 


{z € C[0, 1]: p(y,z) <e}= LJ N N{e: |2@2-™) - yG2-™)| < AS}. 0 
k=1 


m=1i=0 


Any C(0, 1]-valued random variable satisfying properties (i)-(iii) of the pre- 
ceding lemma is called a Brownian motion on [0,1] or a Wiener process on (0, 1]. 
The lemma asserts that all Wiener processes have the same distribution and 
that all subsequential limits of (X'"): n = 1,2,...) have the same distribution. 
The common distribution of all Wiener processes on [0, 1], if any exist, is called 
Wiener measure on [0,1]. We have yet to prove the existence of Wiener measure 
or of subsequential distributional limits of the sequence (X™ : n = 1,2,...). 


19.2. THE EXISTENCE OF AND CONVERGENCE TO WIENER MEASURE 371 


19.2. The existence of and convergence to Wiener measure 


In view of the Prohorov Theorem and Proposition 15 of Chapter 18, we may 
resolve the issues of the preceding paragraph by proving that the sequence 
{Qn:n = 1,2,...} is uniformly tight, where Qn is the distribution of X™, 
which itself is defined by the sentence containing (19.1). The uniform tightness 
will be established by a sequence of four lemmas. The first lemma concerns the 
Qn-probability that x(s) differs from z(+) by more than some fixed positive 
quantity z, where s is a time in the interval [4, itt), 


Lemma 3. For z > 0 and j =0,1,..., 


(19.6) 
„im lim sup (msup{Qn({a: |z(s) — 2(£)| >z}): £ < s < 443) =0. 


PROOF. We express the probability in (19.6) in terms of the C[0, 1]-valued 
random variable X‘”: 


Qn({ax: |x(s) — x( 1)| Sah = Pl[|x™ -XO >z]. 


Let ki (n), ka(n) satisfy kaln) < 2 < fy and ka(n)—1 < itt < fala) Be- 


cause the C[0, 1]-valued random ‘variable x) is linear on each interval of the 
form [4=1,£], k = 1,...,n, the supremum in (19.6) is bounded above by the 


non 
supremum over s of the form s = E, ki(n) < k < k(n). For such s, 


PXP — x> 2] 
sP P= a R EA la] 


(n) z 
SPIA X a > 33 Kee AG 


ERIR. )_ xt" aia > 33 lX kann -X| > 2] 
(n) z 
+ PIX Ak (n)/n! > Al 


E PIR Wii =A al ol 


n (n) n 2 
(19.7) FRIR = Xie (n)/nl > $; X a SX a] 
(n) (n) z 


We wish to obtain upper bounds for the three terms in (19.7). For the first 
term, we use the fact that X™ is constructed by taking sums of iid random 
variables with finite variance. Let Z and G, respectively, denote a standard 
normally distributed random variable and its distribution function. The Classical 


372 19. THE INVARIANCE PRINCIPLE AND BROWNIAN MOTION 


Central Limit Theorem, in combination with Problem 2, tells us that (X F ur 


K E a as n + oo. Thus, 


ax 


k(n)/nl > 4] = 201 - G24). 


lim a PIX A 


ka(n)/n 


It is easily shown using the l’Hospital Rule that (1 — G(x)) < £74 for sufficiently 
large x. It follows that for large m, 


: (n) ) 
(19.8) WPR ye a aea BE. 
We use the independence of (X” a ee) n) and (X a maA 9) to factor 


the second term of (19.7) into the aoia of two probabilities, and then apply 
the Chebyshev Inequality to each factor: 


(n (n 
PXP -XÈ yal > PIX nn -X> 


(19.9) Pag a 
16 


The final term in (19.7) goes to 0 as n — oo, since 


(n) 
PIXY (n)/n | > Al = a 


— nz 


by the Chebyshev Inequality and the definition of kı (n). Using this fact together 
with (19.8) and (19.9), we conclude that 


lim sup P [|X 0” D e 


where c is a constant that does not depend on m. Multiplying by m and letting 
m — oo gives (19.6). O 


The next lemma improves on the previous one by moving the supremum over 
s into the event under consideration. 


Lemma 4. For z >0 andj = 0,1,..., 


lim lim sup mQn({2: sup{|z(s) —a(+)|: 1 <s< itt} > z}) =a 


M00 n—-00 


Problem 3. Prove the preceding lemma. Hint: Arrange to use the Etemadi Lemma 
to bring the supremum of Lemma 3 inside. 


We need one further improvement, which is obtained by taking the union over 
7 =0,...,m-—1 and then using the triangle inequality, so that the supremum can 
be taken over pairs of times t, u such that |t—u] < Ł. The resulting probabilistic 
statement about uniform continuity provides just what we need to prove uniform 
tightness in Lemma 6. 


19.2. THE EXISTENCE OF AND CONVERGENCE TO WIENER MEASURE 373 


Lemma 5. For z > 0, 
; : : B? s [F i = 
im lim sup Qn ({2: sup{|z(u) — z(t)|: t- ul < +} > z}) = (0. 


PROOF. Fix z > 0. By Lemma 4 


MOO n-+00 


m—1 
lim im sup Qn ( U la max{|x(s) — z(Ż)]|: L <s< H1) > 2} ) =) 
j=0 
The proof is now completed by replacing z by ¢ in this last inequality, and noting 
that if |z(u) — x(t)| > z for some t and u such that |t — ul < +, then by the 
triangle inequality, there exists an integer j € {0,1,...,m-—1} and a real number 
sE [4, itt) such that |z(s) —2(+)| Poe 


Lemma 6. The sequence {Qn: n = 1,2,...} is uniformly tight. 


PROOF. Fix £ > 0. We need to show the existence of a Borel subset A of 
C[0, 1] which has compact closure and for which Q,(A‘) < e for every n. We 
will only include in A functions whose value at 0 is 0. Then, by the Arzela-Ascoli 
Theorem, in order for A to have compact closure it is necessary and sufficient 
that A be a uniformly equicontinuous set of functions. 

By Lemma 5 there exist, for each k = 1,2,..., integers p and rẹ such that 


(19.10) Galta max{|x(u) — z(t)|: |t- uļ < +} > 13) < E27% 


for m = pk and n > rx. By monotonicity in m, (19.10) holds for m > pp and 
n > rk. By the Continuity of Measure Theorem, there exists an integer g, such 
that (19.10) also holds for m > q, and n < rg. Set Mmk = pk V qk. Then, for each 
n and k, (19.10) holds whenever m = mę. 

Let 


A= Nfa: x(0) = 0 and max{|x(u) — z(t)|: |t- ul < 2} < i). 
k=1 


From (19.10), Qn(A‘) < £ for every n. It remains to show that A is an equicon- 
tinuous set of functions. Let y > 0. Choose k so that Ł < y and set 6 = re 
Then, for x € A, 

max{|x(u) — z(t)|: |t- u| <ô} <7, 
as desired. O 


The uniform tightness that has now been established implies that every subse- 
quence of (Qn) has a further subsequence that converges. In view of Lemma 2 the 
existence of such a convergent subsequence establishes the existence of Wiener 
measure Q on C[0, 1] (and, as already contained in Lemma 2, Wiener measure is 
unique). Also by Lemma 2, all limits of convergent subsequences are identical, 
and hence by Proposition 15 of Chapter 18, Qn —> Q as n — oo. We have thus 
proved the next two theorems. 


374 19. THE INVARIANCE PRINCIPLE AND BROWNIAN MOTION 


Theorem 7. Wiener measure on C[0,1] exists and is unique. 


Theorem 8. [Donsker Invariance Principle] Let (¥1,¥o,...) be an iid se- 
quence of R-valued random variables having finite mean u and positive finite 
variance o?. Forn = 1,2,..., define X™ € C[0, 1] as follows: form =0,...,n, 
set 


xi). Me - w) 


m/n oyn í 
and then extend X™ to all of [0,1] by linearity on each of the intervals (=, ae 
m=1,...,n. Let Qn denote the distribution of X‘. Then Qn > Q as n > œ 
where Q is Wiener measure on C[(0, 1]. 


The term ‘invariance’ in the name of the preceding theorem refers to fact that 
the conclusion depends on the distribution of the Yp only through its mean and 
variance. 


19.3. Some measurable functionals on C[0, 1] 


Because (C(0, 1], C) is itself a space of functions, we will use the term ‘functional’ 
for any R-valued function having C[0, 1] as its domain. We will treat any one such 
functional as several random variables by placing different probability measures 
on the measurable space (C[0, 1],C). 

One measurable functional we will treat is M defined by 


M(x) = max{z(t):0<t< 1}. 


The distribution R of M under Wiener measure Q equals the distribution of the 
random variable M o W, where W is a Wiener process. Similarly, if Qn denotes 
the distribution of X'™, where X‘™ is defined in terms of a random walk as in 
the statement of the Donsker Invariance Principle, then the distribution R, of 
M under Qn is the distribution of M o X. The functional M is continuous, 
so by Proposition 10 of Chapter 18, Rn —> R. (This observation is also made in 
Problem 9 of Chapter 18.) Thus, knowledge of R yields knowledge of the limiting 
behavior of the sequence (Rn), and conversely; we will use both directions. 
A second functional J defined by 


1 
J (x) = j x(t) dt 


will be treated in a problem; it is obviously continuous. 

A third functional K will also be studied: K(x) equals the Lebesgue measure 
of {t € [0,1]: z(t) > 0}. We will prove that K is measurable, although it is not 
continuous. Despite this lack of continuity, we will show that the sequence of 
distributions of K under Qn converges to that of K under Q. 

Returning to the functional M we will use the ‘reflection principle’ to calcu- 
late its distribution under Qn, where Qn is induced, as in the Donsker Invariance 


19.3. SOME MEASURABLE FUNCTIONALS ON C(O, 1] 375 


Principle, from an iid sequence the common distribution of which assigns prob- 
ability 1/2 to each of {1} and {—1}. It is clear that Qn assigns probability 1 to 
the set of functions x for which M(x) = z(@) for some integer m depending on 
x; we will use this fact without comment below. 

Fix a positive integer n. For c a positive integral multiple of a and m = 
1,2,...,n, set 


Dm = {x € C[0, 1]: 2(2) = c > 2(4) for i < m}. 


n 


Then 


n 


Qn({2: M(x) > c}) = Qn( ` Dm) an Oa 


m=l 


We also have 


Qn(Dm) = Qn(Dm N {x: £(1) — (%) > 0}) 
+ Qr (Din N {z: xz(1) — Ha 0}) 
+ Qn(Dm N{z: 2(1)-2(2)= O}) 
= Qn(Dm) Qn({z: (1) — x(2) > OF) 
+ Qn(Dm) Qn({z: z(1) - a2) < O}) 
+ Qn(Dm A{x: 201) -—2(2)= 0}) 
= 2O a Dix PEE D SS 0}) 
$ Qal Dn Mrr) =a) 0}) ; 


When we sum over m we obtain 


(19.11) Qn({z: M(x) > c}) = 2Qn({z: al) > c} Onde) Sch): 


Now fix c to be positive and rational, and thus an integral multiple of Ta for 
an infinite sequence of positive integers n. Letting n — oo through that sequence, 
the Donsker Invariance Principle implies that the left side of (19.11) converges 
to Q({x: M(x) > c}). The set {x: z(1) = c} is closed in C[0, 1]. It also equals 
the boundary in C[0, 1] of the set {x: z(1) > c}. Since Q({z: x(1) = c}) = 0, 
the Portmanteau Theorem of Chapter 18 implies that the right side of (19.11) 
converges to 2Q({x: x(1) > c}). Thus 


Qn({x: M(x) > c}) > 2Q({x: x(1) > c}) = er e-F du. 


We have thus proved the following two consequences of the Donsker Invariance 
Principle. 


Theorem 9. Let Q denote Wiener measure on C[0,1] and set 


M(x) = max{z(t):0<t<1}, ze cC[0,1]. 


376 19. THE INVARIANCE PRINCIPLE AND BROWNIAN MOTION 


Then, for c > 0, 


Q({x: M(x) < ¢}) = \2 fe* du: 


Theorem 10. Let (Yp: k = 1,2,...) be an iid sequence of R-valued random 
variables having finite mean u and positive finite variance o?. Then, for c > 0, 


m c 2 

dim, Paar 0% —p):m= iit) < e| = ZI e~ = du. 

Notice the general scheme underlying the preceding discussion. Start with 
some easy-to-analyze sequence of iid random variables having finite nonzero 
variance. Obtain some result for its partial sums. Go to the limit to obtain 
a corresponding result for the Wiener process. Then obtain a limit result for the 
sequence of partial sums of an arbitrary iid sequence, with no restriction on the 
common distribution of the summands other than that of positive finite variance. 
For some calculations, such as the one in the following problem, there is no need 
to start with a particular sequence of iid random variables; instead one does the 
initial computation directly with Wiener measure or a Brownian motion. 


* Problem 4. Let Q denote Wiener measure. Find the distribution under Q of the 
functional J, where 


I(x) = J x(t) dt. 


Then apply the Donsker Invariance Principle in connection with this distribution. 


In preparation for treating the functional K introduced earlier, we obtain the 
following result, which is itself quite interesting. It says that the set of times 
when a Wiener process equals 0 has Lebesgue measure 0 a.s. 


Proposition 11. Let Q denote Wiener measure and let A denote Lebesgue 
measure on [0,1]. Then 


Q({a: A({t € [0, 1]: z(t) = 0}) = 0}) si. 


PROOF. The function (t,x) ~ z(t) defined on [0,1] x Cjyo,1) is continuous 
as a function of t for each fixed z and also as a function of x for each fixed 
t. By Proposition 9 of Chapter 9 it is measurable as a function of (t,x), and 
hence the indicator function of {(t,x): z(t) = 0} is measurable. Therefore, we 
may apply the Fubini Theorem to this indicator function. Since nondegenerate 
normal distributions assign measure 0 to every one-point set, we obtain 0 by 
integrating first with respect to Q. Thus, we must also get 0 when we integrate 
first with respect to A; that is, 


J (/ Tra): et)=0) Nat) ) dQ = 0. 
Cfo,1] \J [0,1] 


19.3. SOME MEASURABLE FUNCTIONALS ON C(0, 1] 377 


Hence, the inside integral must equal 0 for almost every x € C[0,1]. This fact 
finishes the proof since the inside integral equals A({t: z(t) = 0}). O 


With A continuing to denote Lebesgue measure, the definition of the functional 
K can be written as 


(19.12) K(x) = X({t: x(t) > OF), 


and we also define N 
K(x) = A({t: z(t) > Of). 


Problem 5. Prove that K and K are measurable functionals. 


Problem 6. Prove that the boundary in C[0, 1] of {x: K(x) < c} equals 
{z: K(x) < c < K(e)} 


for each cE R. 


By Proposition 11 and the preceding problem, the Q-measure of the boundary 
of {x: K(x) < c} equals 0 for every c that is a continuity point of the distribution 
function of K. By the Portmanteau Theorem, 


lm Qn({z: K(x) < ¢}) = QUe: K(x) < c}) 


for such c. Accordingly, we may proceed for K as we did for M—first calculating 
the distribution of K under the measure Qn induced by an iid sequence the 
common distribution of which assigns probability 1/2 to each of {1} and {—1}, 
and then taking the limit as n — co to obtain a fact about the Wiener process 
and a limit theorem. 


Lemma 12. For positive even integers k, 


È ra (fa) (ata) = Cy) 


j=2 
jeven 


PROOF. Divide both sides by 2* to obtain the following equivalent statement 


Ep1 /j\I1 k-j 1 k\1 
a) Doge (ja) al (cha) ea] = (epo) 
jeven 

The right side equals the probability that a simple symmetric random walk 
equals 0 at time k. From Example 5 of Chapter 11, we see that the jt” term in 
the left side of (19.13) equals the probability that a simple symmetric random 
walk returns to O for the first time at time j and also equals 0 at time k (the 
possibility that k = j not being excluded). By finite additivity of probability 
measures, (19.13) follows. O 


378 19. THE INVARIANCE PRINCIPLE AND BROWNIAN MOTION 


Theorem 13. Let n be an even positive integer and Qn the distribution of 
X'™ as defined in the sentence containing (19.1) with the distribution of each 
Y, assigning the value 1/2 to {1} and to {—1}. Then, 


Qn({z: K(x) = m) = ya) Ta aa i form =0,2,4,...n 
É 0 otherwise , 


where K is defined by (19.12). 


PARTIAL PROOF. We will use induction on even positive n. The verification 
for n = 2 is trivial. Suppose that n > 4 and that the statement of interest is 
true for every positive even integer less than n. We only consider m for which 
0<m <n, leaving the two extreme cases m = 0 and m = n for the reader. For 
xz € C[0, 1], let 

T(x) = inf{t > 0: c(t) = 0}. 
By finite additivity, the independence of the steps of a random walk, symmetry, 
and the distribution of T given in Example 5 of Chapter 11, 


Qn({a: Kle) = *}) 
> Qalla: Ta) = £,2(2) > 0,K(@) = 2} 


af 2 Qn({2: T(x) = 7,2(5) < 0 K(x) = 7}) 
“1S aleine = one Ko E 
LEa 3 a(z: T(x) = £})Qn—j({2: K(a) = 2}) 
sehen (cadets 
E a(S) cada 
- in a) È E 


Ea OPEC] 


19.4. BROWNIAN MOTION ON (0, co) 379 


which in combination with Lemma 12 completes the argument forO <m <n. O 


Problem 7. From the above presentation it appears that a formula from Example 5 
of Chapter 11 has been used twice—once in the proof of the theorem itself and once 
in the proof of the lemma that was used in the proof of the theorem. However, we 
did not actually need the formula from Chapter 11. Explain. 


Problem 8. Treat the cases m = 0 and m = n which were not treated in the partial 
proof of Theorem 13. 


Theorem 14. [Arcsin Law (for Time Spent Positive)] Let Q denote Wiener 
measure on C[0, 1] and define K by (19.12). Then, for c € [0,1], 


Q(z: K(x) <c}) = = Ž arcsin Ve. 


aS ga” 


Problem 9. Prove the preceding theorem. 


Corollary 15. Let (So = 0, S1, S2,...) be a random walk in R whose steps 
have mean 0 and positive finite variance. Then, 


Rim <n: Sm > OF < 


lim P| < c! = Ž arcsin Ve 


n— CO n 


for0<c<li. 


Problem 10. Prove the preceding corollary. Hint: You may find Proposition 11 
useful. 


* Problem 11. For a Wiener process W calculate the expectation of 
A({t € [0,1]: W: > 0}) V A({t € [0,1]: We < 0}), 


where A denotes Lebesgue measure. 


19.4. Brownian motion on [0, 00) 


According to Problem 5 of Chapter 18, C[0, oo) can be regarded as a Polish space; 
convergence in it is equivalent to uniform convergence on bounded intervals. 

Here is one way to construct a C[0, 00)-valued random variable that has sta- 
tionary independent increments and is a Wiener process when restricted to [0, 1]. 
Let (W: n =0,1,...) be an iid sequence of C[0,1]-valued Wiener processes, 
and for n =0,1,... and t € [n,n + 1) define 


380 19. THE INVARIANCE PRINCIPLE AND BROWNIAN MOTION 


The random function W thus constructed is called Brownian motion on [0, 00) 
or a Wiener process on |0,00). Its distribution is Wiener measure on C[0, 00), 
or simply Wiener measure when there is no likelihood of confusion with Wiener 
measure on C[0,1]. Without regard to the above construction, any C[0, 0o)- 
valued random variable whose distribution is Wiener measure is called a Wiener 
process or Brownian motion, with the modifying phrase “on [0,00)” being in- 
cluded if necessary. The following problem gives an alternative construction of 
Brownian motion on [0, co). 


Problem 12. Let WC? denote a Wiener process on [0,1]. For n = 1,2,..., set 
(n 1 
xm™ = SAW anja t € [0, 00). 


Prove that X + W as n —> co, where W is a Wiener process on [0, 00). 


One advantage of extending Brownian motion to [0, o0) is that Wiener mea- 
sure on C[0,0o) is invariant under a wider variety of transformations than is 
Wiener measure on C[0, 1]. 


Theorem 16. Let W be a Brownian motion on [0,co). Then each of the 
following is also a Brownian motion on [0, œ): 
(i) t ~e -Wi Ee l 
(ii) t~~ (Ws+t — Ws), where s > 0 is fixed; 
(iii) t ~> ya Wija, where a > 0 is fixed; 
(iv) 
0 f t=0. 


PARTIAL PROOF. It is easy to see that each of the first three transformed 
processes has stationary independent increments and that the distribution at 
any particular time t is normal with mean 0 and variance t. By Lemma 2, it 
follows that these three processes are Brownian motions. 

Let V be the fourth process obtained from W as described in the theorem. It 
is easily checked that for each t > 0, V; is normally distributed with mean 0 and 
variance t. The independence of the increments is also obvious. It follows from 
these two facts that the variance of (V; — V;) equals |t — s| and, therefore, that 
the increments are stationary. It would seem that we have checked the necessary 
conditions in order to apply Lemma 2, but this is not so, because we still need 
to show that V is C[0,0o)-valued (the corresponding fact being obvious for the 
first three transformations). 

Clearly, t ~~ V;(w) is continuous on (0,00) for each w. We need to confirm 
almost sure continuity at 0. Equivalently, we must show that 


W, 
lim a ee 0 a.s. 
t= t 


19.4. BROWNIAN MOTION ON (0,00) 381 


Since W has stationary independent increments with mean 0, it follows imme- 
diately from the Strong Law of Large Numbers that 


lim Wn = Q0 a.s. 
nEZ+t\ {0} 


So to complete the proof, it is sufficient to show that 


MaXn<t<n+1 |W; P Wal = 


lim 0 a.s. 
NM OO n 


For c > 0 and n = 1,2,..., set 


Aca du ee |W,—-—W,| > en}. 
The desired almost sure convergence is equivalent to P(limsup,, Ac») = 0 for 
all c > 0. By the Borel Lemma, it is enough to show that 57, P(Aen) < œ for 
c > 0. Since W has stationary increments, 


P(Ac n) = P| max |W:| > en]. 
O<t<1 


It is left for the reader to use Theorem 9 to verify that 5°, P(Acn) <œ. O 


Problem 13. Make the calculation to which the last sentence of the preceding proof 
refers. 


The transformations in the preceding theorem are sometimes given the fol- 
lowing names: (i) symmetry or spatial symmetry; (ii) time shift; (iii) scaling or 
change of scale; (iv) time inversion. Note that in the course of showing that 
Brownian motion is invariant under time inversion, we also proved the following: 


Proposition 17. [Strong Law for Brownian motion] 


W, 
lim — =0 as. 
tooo t 


Problem 14. Use Theorem 9 and Theorem 16 to prove that a Wiener process W 
on [0,0o) has the following two properties with probability 1: (i) There exists a 
strictly decreasing random sequence (T;,: k = 1,2,) converging to 0 a.s. such that 
Wr,,(w)(w) > 0 for every k; (ii) There exists a strictly increasing random sequence 
(Tk: k = 1,2, ) approaching œ a.s. such that Wr, (w) (w) > 0 for every k. 


Problem 15. Use symmetry and the preceding problem to draw two further con- 
clusions about Wiener processes. 


382 19. THE INVARIANCE PRINCIPLE AND BROWNIAN MOTION 


19.5. Filtrations and stopping times 


We adapt to the C[0, o0)-setting some concepts that were introduced for random 
sequences in Chapter 11. 


Definition 18. Let (Q, F) denote a measurable space and let F; be a sub- 
o-field of F for each t € [0,00). Then (F;: t € [0,00)) is a filtration in (Q, F) 
if (s < t) > (Fs, C Fi). Let Y be a C[0, o0)-valued random variable defined on 
(2,7). Then Y is adapted to a filtration (F+: t € [0,00)) if, for each t, Y; is 
measurable with respect to F;. Corresponding to a filtration (F+: t € [0,0o)), 
we use the notation 

Foe = OF 2b jio. 


A C[0,0o)-valued random variable Y may be adapted to different filtrations 
(Fi). The one defined by Fy = o{¥;: 0 < s < t) is the minimal filtration of Y. 

The following result is an immediate consequence of Proposition 9 of Chap- 
ter 9. 


Proposition 19. Let Y be a C[0,0o)-valued measurable function defined on 
a measurable space (Q,F) and adapted to a filtration (F;: t > 0). Then for each 
t, the function 
(wis) Y,(w), (wis) Ex [0,4], 


is measurable with respect to Fi x B+, where B; is the Borel o-field of [0, t]. 


The measurability property asserted for Y and the filtration in the preceding 
proposition is called progressive measurability. This concept is also used for 
random variables having values in function spaces other than C[0, oo), but for 
some function spaces, analogues of the preceding proposition may not hold. 


Definition 20. Let (7;: t € [0,00)) be a filtration in a measurable space 
(0,7). An IR’ -valued random variable T defined on (Q,F) is a stopping time 
with respect to this filtration if {w: T(w) < t} € F; for every t € R*. 


Definition and Proposition 21. Let T be a stopping time with respect to 
a filtration (F;: t € [0,co)). The collection of events A such that 


AN {w:T(w) St} EF fralteR' 


is a o-field. It is denoted by Fr. 


Problem 16. Prove the preceding proposition. 


Problem 17. Prove that any stopping time T is measurable with respect to the 
o-field Fr. 


Problem 18. Let S < T be stopping times. Prove that Fs C Fr. 


19.5. FILTRATIONS AND STOPPING TIMES 383 


Proposition 22. Let Y be a C[0, œ]-valued random variable adapted to a 
filtration (Fı: t € [0,00)) and let Yœ be any R-valued, Foo -measurable function. 
Then, for any stopping time T with respect to this filtration, w ~> Y7 ,,)(w) is 
measurable with respect to the o-field Fr. 


PROOF. Let B be a Borel subset of R and t € R°. We must show 
{w: Yro l) € B, Tw) <t} EF. 
An equivalent statement is 
{w: Yirqwyaty(w) EB, Tw) <t) EFi. 
Since {w: T(w) < t} € Fi, we only need show 
{w: Yirqwyat)(w) E B} € F. 
We can do this by showing that 
wm» Yiriwan (w) 


is a measurable function from (Q, F+) to (R, B), where B denotes the Borel o-field 
of R. This function is the composition of two functions: 


w ~ (w, (T(w) At) 
from (Q, Fi) to (Q x [0, t], 7, x B+); and 
(w, 8) ~ Ys(w) 


from (Q x[0, t], 7; x B+) to (R, 8). The second of these two functions is measurable 

by the progressive measurability of Y. The first coordinate of the first of the two 

functions is obviously measurable, since F, is used for both domain and target. 

Finally, 

{w: T(w) <s} ifs<t 
Q ifs >t 


tiras i E Fsnt C Fi, 


as desired. [1 


Problem 19. Explain why the assertion in Problem 17 is a corollary of the preced- 
ing proposition. 


Problem 20. Show that several other aspects of the theory concerning filtrations 
and stopping times carry over from Chapter 11 to the [0, co)-setting. 


The hitting time of a set A by a C[0, 0o)-valued measurable function Y is 


inf{t > 0: ¥:(w) € A}. 


384 19. THE INVARIANCE PRINCIPLE AND BROWNIAN MOTION 


Proposition 23. Let Y be a C[0,0o)-valued random variable and let C be a 
closed subset of R. Then the hitting time of C by Y is a stopping time (with 
respect to any filtration with respect to which Y is adapted). 


PROOF. Let (F;: t € [0,00)) denote the filtration to which Y is adapted. Let 
T denote the hitting time of C and, for m = 1,2,..., let 


Cm = {r: |r — c| < + for some c € C}. 
For any t € Rt, the continuity of s ~ Y,(w) for each w gives 


{w: T(w) <t} = lim UJ (es Yolu) ECm}ER. O 
s rational 


Suppose that Yo = 0, as is the case for Brownian motion. Let C be a closed 
subset of R. If 0 € C, the hitting time of C is 0. If 0 ¢ C and C contains at 
least one positive number, remove all the positive numbers from C except the 
smallest. If 0 ¢ C and C contains at least one negative number, remove all the 
negative numbers from Č except the largest. The hitting time of the new set thus 
obtained is the same as the hitting time of C. The new set is either a two-point 
set, a one-point set, or empty. The hitting time in the last case is oo. In the 
next section, we will find the moment generating functions of the hitting times 
by Brownian motion of one- and two-point sets, and the distributions themselves 
for certain two-point sets and all one-point sets. 

Hitting times of sets that are not closed may not even be stopping times with 
respect to minimal filtrations. There exist, for example, members x and ĉ of 
C[0, co) such that z(s) = £(s) for O < s < 2, the hitting time of (1,00) by 
x equals 2, and the hitting time of (1,00) by @ is greater than 2. Sometimes 
one uses nonminimal filtrations in order that the hitting time of every set be 
a stopping time. One commonly used approach is to begin with the minimal 
filtration (F;: t € [0,00)) of some C[0, 00)-valued random variable Y and then 
set 


(19.14) Fii = a var . 
s>t 


Clearly F; C Fip, so Y is adapted to the filtration (F+: t € [0, 00)). 


Problem 21. For Y and (F+: t € [0, cc)) as described in the preceding paragraph, 
prove that the hitting time of any set by Y is a stopping time. 


Problem 22. Show for any filtration (F+: t € [0,00)), that the filtration (F+: t € 
[0, 00)) defined by (19.14) is right-continuous in the sense that F44 = i Fete It 
is the minimal right-continuous filtration of Y in case (Fz: t € [0,00)) is minimal 
for Y. 


19.6. BROWNIAN MOTION, FILTRATIONS, AND STOPPING TIMES 385 


It is conceivable that a C[0, 00)-valued random variable have some nice prop- 
erties with respect to its minimal filtration but loses them if that filtration is 
replaced by the minimal right-continuous filtration. However, we will see in the 
next section that Brownian motion does not lose its important properties when 
such a replacement is made. 


19.6. Brownian motion, filtrations, and stopping times 


By virtue of having stationary independent increments, a Brownian motion W 
on [0,co) has the property that for any fixed s, the C[0,0o)-valued random 
variable t ~~ (W,.; — Ws) is a Brownian motion that is independent of F, = 
a(W(-,u):0<u<s). The following result strengthens this assertion. 


Proposition 24. Let (F+: t € [0,00)) be the minimal right-continuous fil- 
tration of a Brownian motion W. Then for each s € [0,00), t ~ (Wst — Ws) 
is a Brownian motion that is independent of Fs+. 


PROOF. Since F,4 C Fspe for € > 0, the C[0,00)-valued random variable 
t ~> (Ws+e+t — Ws+e) is a Brownian motion independent of F+. Now lete N 0 
and use continuity in conjunction with Theorem 20 of Chapter 18 to conclude 
that the Brownian motion t ~> (W,4; — Ws) is independent of every R-valued 
F,4.-measurable random variable and thus of Fs+ itself. O 


We wish to adapt Corollary 11 of Chapter 11 to the current setting. However, 
there is no C[0, oo)-analogue of Proposition 10 of that chapter, so we will need 
to use a different approach. 


Theorem 25. Let W be a Brownian motion, (F+: t € [0,00)) the mini- 
mal right-continuous filtration of W, and T a stopping time with respect to that 
filtration. Suppose that P[T < œ] = 1. Then 


(19.15) t ~~ [Writ — Wr] 


is a Brownian motion with respect to the filtration (Fir4t)+: t € [0,00)) and is 
independent of Fr+. 


PROOF. From Problem 18 and Proposition 22 we see that (Fir++) is a 
filtration and that (19.15) is an almost surely defined C[0,0o)-valued random 
variable adapted to this filtration. (This observation includes the fact that the 
set of w where (19.15) does not apply is {w: T (w) = co}, which is a member of 
every o-field in the filtration (F(7+4)+).) 

For each w, set T,(w) = +[nT]. For each k € Z+, 


{w: Tn(w) < £) = {w: T(w) < E) E Fue: 


Therefore, Ta is a stopping time with respect to the (discrete time) filtration 
(Fik/n)+: k = 0,1,2,...). Also, (Wk/n: k = 0,1,2,...) is a random walk, and 


386 19. THE INVARIANCE PRINCIPLE AND BROWNIAN MOTION 


so Proposition 24 and an easy adaptation of Corollary 11 of Chapter 11 imply 
that 


(19.16) k ~ [Wr  —W7,] 


is a random walk that is independent of Fr,+. By Problem 18, Fri C Fr,+; 
thus this random walk is independent of Fr. 

Let X be the object defined in (19.15). Since X is C[0, 00)-valued, its values 
at various times ¢ can be expressed as limits of the values of the random walks 
defined in (19.16), each of which has independent increments and is independent 
of Fr+. Since independence is preserved in the limit (Theorem 20 of Chapter 18), 
it follows that X has independent increments and is independent of Fr+. It is 
also easy to check by taking limits of the increments of the random walks that 
the distribution of X; — X, is normal with mean 0 and variance t — s for each 
s,t > 0. Thus, X is a Wiener process that is independent of Fr+, as desired. O 


In view of Theorem 25, Problem 21, and Problem 14, the hitting time of 
any set by Brownian motion almost surely equals the hitting time of its closure. 
Thus, the study of distributions of hitting times of arbitrary sets reduces to the 
study of distributions of hitting times of one- and two-point sets. We begin by 
treating the one-point sets. 

For b > 0, let Tp(x) equal the hitting time of the singleton {b} by z € C[0, oo]. 
Let W denote a Wiener process. By Proposition 23, Tẹ is a stopping time with 
respect to any filtration to which W is adapted. We set two goals: to find the 
distribution of each R -valued random variable T o W and to study the random 
function b ~ T, 0 W. Equivalently, with Q denoting the distribution of W, our 
goals are to find the distribution of each Tẹ viewed as a random variable on 
(C[0, co),C,Q) and study the random function z ~ [b ~> Ta(£)]. As an aid we 
use M, defined by 

M(x) = max{z(s): 0< s <t}. 


Clearly, 
(19.17) {x € C[0, co]: T(x) < x} = {x € C[0, co]: Milz) > b}. 


Theorem 9 and Theorem 16 yield the distribution of M; 0 W for each t. Thus, 
we can determine the distribution of T, from (19.17). 


Theorem 26. Let Q denote Wiener measure on C[0,0o). For all b > 0, Th, 
defined on (C[0, 00),C,Q) as the hitting time of {b}, has a stable distribution of 
index t which has the continuous density 


0 ift <0 


and the moment generating function u ~ exp(—by 2u ). 


19.6. BROWNIAN MOTION, FILTRATIONS, AND STOPPING TIMES 387 


PROOF. From Problem 16 of Chapter 15 we see that the density and the mo- 
ment generating function in the theorem correspond to each other, and that they 
are stable of index 1/2. Thus, we only need verify the formula for the density. 
By (19.17) and the invariance of Brownian motion under scaling (Theorem 16), 

QIT, < t] = Q[M: > b] = QIM > $] 


for t > 0. By Theorem 9, this last quantity equals 


TE a a 
ten | e 2 du. 
T Jo 


Differentiation with respect to t gives the formula in the theorem, as desired. O 


FIGURE 19.1. Stable distribution function of index 1/2 and a 
related empirical distribution function 


Problem 23. The distribution function described in Theorem 26 is illustrated in 
Figure 19.1 for the case b = 7. Also, shown (with jumps filled in as a visual aid) 
is a portion of the empirical distribution obtained when 40 people each did the 
following fair-coin-flip experiment. Each flipped until the number of heads exceeded 
the number of tails by 7 or the total number of flips equaled 98, whichever came 
first. In the former case the person recorded the total number of flips, whereas 
in the latter case the person recorded that more than 98 flips would be needed. 
Discuss reasons for placing the two graphs on the same coordinate system. 


Our knowledge of stable distributions shows that Tẹ < œ a.s. (This conclu- 
sion can also be drawn from the Law of the Iterated Logarithm at oo given in the 


388 19. THE INVARIANCE PRINCIPLE AND BROWNIAN MOTION 


last section of this chapter.) The assumption in the preceding discussion that 
b > 0 was for convenience and, since the distribution of T_, is the same as the 
distribution of Tp, it entailed no loss of generality. The distribution of To is the 
delta distribution at 0. 

We have been using the terms ‘stationary increments’, ‘independent incre- 
ments’ and ‘adapted to a filtration’ for C[0, 1]- and C[0, co)-valued random vari- 
ables. However, these terms may equally well be used for random variables 
taking values in other function spaces, as in the following theorem. 


Theorem 27. Let Q denote Wiener measure on C[0, œ). For b > Q, let 
T(x) denote the hitting time of {b} by x € C[0,00). Then the random function 
b ~> T defined on (C[0,00),C,Q) has stationary independent increments and is 
surely left-continuous. 


Problem 24. Prove the preceding theorem. Hint: Use Proposition 5 of Chapter 9 
and Theorem 25 of this chapter. 


T(x) 


FIGURE 19.2. Hitting time process b ~> T(x) for some z € C[0, co) 


The function b ~ T(x) for a particular z is shown in Figure 19.2 with b on 
the vertical axis. It is clear that b ~~ T(x) is not right-continuous. This fact is 
not inconsistent with the assertion in the following problem that for any fixed b, 
this function is continuous at b a.s. 


Problem 25. Let Q denote Wiener measure on C[0, co) and let b > 0. Prove that 


QUz: To(a) = lim Te(2)}) = 1. 


The random function in Theorem 27 is, except for a technicality, an example 
from the class of random functions treated in Chapter 30. For it to fit exactly 


19.7. CHARACTERIZATION OF BROWNIAN MOTION 389 


into the setting of Chapter 30 it would have to be redefined at its jumps so as to 
be right-continuous— T; (a) = lim... Te(x). The term first passage time of b is 
used for Tp. The preceding problem says that, for each fixed b, the first passage 
time of b and the hitting time of b by a Brownian motion are almost surely equal. 

We switch from treating hitting times of one-point sets to studying hitting 
times of two-point sets. 


Theorem 28. For —a <0 < b let Ta (£) denote the hitting time of the two- 
point set {—a,b} by x € C[0,00). Then the moment generating function of Ta», 
when governed by Wiener measure on C[0, 00), is the function 


re sinh(aV/2u ) + sinh(bV2u) 
sinh((a + b)V2u) l 


where sinh denotes the hyperbolic sine function. 


Problem 26. Prove the preceding theorem. Hint: Show that the Wiener measure 
of the boundary of {x: Ta (£) < t} equals 0 if Q({x: Ta (x£) = t}) = 0. Also, use 
Example 4 of Chapter 11. 


* Problem 27. Show that Eo(Ta,,) = ab, where Ta,» is defined in the preceding the- 
orem and the subscript Q, denoting Wiener measure, indicates that T' is regarded 
as a random variable on (C[0, œ), C, Q). 


Problem 28. Continuing with the notation of Theorem 28 show that the density 
of Te» with respect to Lebesgue measure on (0, 00) is 


(e @} 
/ 2 k _ (2k41)2b? 
t~ b PEE] B (2k + 1)e 2t F 


Hint: Problem 16 of Chapter 15 may be useful. 


19.7. { Characterization of Brownian motion 


We will show that condition (i) in Lemma 2 plays a minor role in that lemma. 
The proof of this fact relies on the convergence theorem for row-wise iid triangular 
arrays (Theorem 22 of Chapter 16). 


Theorem 29. Let V be a C[0,0o)-valued random variable such that Vo = 
0 a.s. If V has stationary independent increments, then either there exists a 
constant b E€ R such that V; = bt a.s., or there exist constants a > 0 andbe R 
such that t ~~ (V; — bt)/a is a Brownian motion on [0, 00). 


PROOF. We begin by proving that for all t € [0, 00), V; is normally distributed 
(with possibly 0 variance). Fix t and set 


Xm,n = Ymt/n — Vim-1)t/n 


390 19. THE INVARIANCE PRINCIPLE AND BROWNIAN MOTION 


form = 1,...,n and n = 1,2,.... Since V has stationary independent incre- 
ments, each row of the resulting triangular array (Xj m,,) is iid. For each fixed n, 
the row sum is 


n 
ee 
m=1 


so the distribution of V; is obviously infinitely divisible. Let v be the Lévy 
measure of the distribution of V. We wish to show that v is the zero measure. 
For each n, let M, = max{|Xmn|: Mm = 1,...,n}. By Problem 48 of Chap- 


D . ! ; ; 
ter 16, Mn —> M as n > œ, where M is a nonnegative random variable with 
distribution function 


ae 25 oe (2,00)—v(—0c0,—2) , z>0. 


On the other hand, since V is C[0, 1]-valued, s ~~ V,(w) is a uniformly continuous 
function on [0, t] for each w. The definition of uniform continuity implies that for 
each w, M,(w) > 0 as n > œ. Thus M = Q a.s., and so v is the zero measure. 
It follows from the Lévy-Khinchin Representation Theorem that each V; has a 
(possibly degenerate) normal distribution. 

Using the fact that V has stationary independent increments, it is easy to see 
that 
E(V;) 

t 
do not depend on t, first for rational ¢ and then, by taking limits, for all t € (0, 1). 
Denoting these two quantities by a and b, respectively, we see that V; is normally 
distributed with mean bt and variance a’t. 

The proof is complete in case a = 0. If a > 0, let W; = (V; — bt) /a. It is 
easily checked that W has stationary independent increments, and that for each 
t € (0,1], W; is normally distributed with mean 0 and variance t. Thus W is a 
Brownian motion. O 


Vato) and 


The terms ‘Brownian’ and ‘Wiener’ are often used in conjunction with any 
C[0, 1]-valued random variable V having stationary independent increments. 
The process in case a = 1,b = 0 is a standard Brownian motion, and the cases 
with a = 0 are called degenerate. If a,b 4 0, then V is a Brownian motion with 
drift. The preceding sections of this chapter treat standard Wiener measure and 
standard Brownian motion, although the adjective ‘standard’ is not used there. 


19.8. {| Law of the Iterated Logarithm 


This section is devoted to the following result which describes more accurately 
than does Proposition 17 the behavior for large times of the large values of 
Brownian motion on [0,0o). A corollary describing the behavior near time 0 is 
also included. 


19.8. LAW OF THE ITERATED LOGARITHM 391 


Theorem 30. [Law of the Iterated Logarithm at oo] Let W be a Brownian 
motion on [0, 00). Then 


W, 
lim sup eas 


t+co 41/2t log(log t) 
PARTIAL PROOF. Let £ > 0. We can finish the proof by showing that there 
exist almost surely defined random variables T and Ti < To < 73 < --- > co 
such that 


(19.18) Wi(w) < (1+e)V/2tlog(logt) fort > Tw), 
and 

(19.19) Wr, (wlw) > (1 — €) v 2Tn (w) log (log Tn (w)) 
lorn = h Ze cass 


We can prove (19.18) by showing the existence of T(w) such that 


(19.20) Mio W(w) < (1+¢)V2tlog(logt) for t > T(w), 


where M(x) = max{z(s): 0 < s < t}. Let c > 1. Because of the monotonicity 
of t ~~ M(x) for each z, (19.20) will follow from P(lim sup An) = 0, where 


Ar = [Men o W > (1+) 2c” log(log c”) | 


By first using Theorem 9 and part (iii) of Theorem 16 and then using the asymp- 
totic relation (9.12) of Chapter 9 we obtain 


u 


2 pe 2 
T J(1+e)e™1/2,/2log(log c”) 
C 


j 1 
m(1+e)* (n log oA J log(n logc) | 


Since c > 1 is, at this point in the argument, arbitrary, we may choose it to 
be less than (1+ ¢)*. Then X P(A,) < œ, so the Borel Lemma implies that 
P(lim sup An) = 0, as desired. 

It remains to define Ti (w) < T2(w) < --- —> oo that satisfy (19.19). As in 
the preceding paragraph we introduce c > 1, to be further specified later in the 
argument, with the intention that (T„) be a random subsequence of (c*: k = 


1,2,...). Let 
By = [War > Werx-1 + (1 — §)4/2c* log(log c) | . 


Because W has stationary increments, 


P(Br) = P [Wier cr > (1 — £)4/2c log(log ck) | 


It is left to the reader to show that $` P(B) = oo for sufficiently large c. 
The independent increment property implies that the events B are independent 


392 19. THE INVARIANCE PRINCIPLE AND BROWNIAN MOTION 


and, therefore, by Borel-Cantelli, that P(lim sup Bk) = 1. By the first part of 
the proof applied to —W, 


(19.21) Wx-1(w) > —(1 + e)e7/?4/2c log(log c*) 


for all sufficiently large k, depending on w. For such a k that also satisfies w € B, 


we have 
Walw) > [(1 - $) - ats | \/ 2c¥ log(log c*) . 


Choose c larger than oe Then (19.19) is obtained by defining each T,(w) 
to equal some c*, where k is chosen so that w € B, and (19.21) holds. O 


Problem 29. Complete the preceding proof by showing that X` P(B) = œ for 
sufficiently large c. 


Corollary 31. [Law of the Iterated Logarithm at 0] Let W denote a Brown- 
ian motion on [0,00). Then 
W: 


lim sup ~= = 1 as 


p 
t\,0 y/2tlog(|log tl) 


Problem 30. Prove the preceding corollary. Hint: Use Theorem 30 and Theo- 
rem 16. 


Problem 31. State a Law of the Iterated Logarithm for the simple symmetric ran- 
dom walk in Z. Prove your assertion by using stopping times to ‘embed’ the random 
walk in a Brownian motion. 


PART 4 


Conditioning 


394 PART 4. CONDITIONING 


If X and Y are two random variables defined on the same probability space, it 
is often the case that information about the value of X also provides information 
about the value of Y. The exception to this was studied in Part 2, namely the 
case in which X and Y are stochastically independent. In general, we will want 
to develop techniques for studying the way in which knowledge about X affects 
our assessment of the value of Y. This subject matter comes under the general 
heading of ‘conditioning’. After the introduction of some preliminary concepts 
from the field of real analysis in Chapter 20, the basic tools of conditioning, 
namely conditional probabilities and conditional distributions, are introduced in 
Chapter 21. These tools will enable us to construct several important classes of 
random sequences, as shown in Chapter 22. In Chapter 23, we study conditional 
expectations, which may be regarded as the means of conditional probability 
distributions. 


Note: Readers who have skipped Part 3 should read the comment at the end 
of the introduction to that part concerning notational shortcuts. 


CHAPTER 20 
Spaces of Random Variables 


This chapter is chiefly concerned with two metric spaces consisting of collections 
of random variables on a probability space (Q, F, P): Lı (Q, F, P), consisting of 
all random variables X: Q — R such that E(|X|) < œ, and L2(Q,F, P), con- 
sisting of those X for which E(X?) < oo. The space Lo (N, F, P) has additional 
structure which makes it a ‘Hilbert space’. General Hilbert spaces are introduced 
in the first section, and L2(Q,F, P) is treated in second section. Basic results 
from these two sections will play an important role in the definition of condi- 
tional probability distributions in Chapter 21. The metric space Lı (Q, F, P) is 
discussed briefly in the third section, and the final section of the chapter treats 
an application of Hilbert space methods to an estimation problem. 


20.1. Hilbert spaces 


Let V be a vector space. That is, V is a set of objects for which two operations 
are defined, addition and scalar multiplication. The operation of addition assigns 
to each pair u,v € V an element u +v € V. Addition in V is associative and 
commutative; there is an additive identity, denoted by 0, which has the property 
that u + 0 = u for all u € V; and for each u € V there is an additive inverse, 
denoted by —u, such that u + (—u) = 0. The operation of scalar multiplication 
assigns to each real number a and each member u of V an element au € V. This 
operation satisfies 


afut+tv)=au+av forae€ Randu,vey; 
(a+b)u=av+bv fora,be Randvey; 
(ab)v =a(bv) fora,be Randve Vy; 
lu=v forveY. 


We will be interested in vector spaces VY that have additional structure. Sup- 
pose there exists a function that assigns to each pair (u,v) € Vx V areal number 


396 20. SPACES OF RANDOM VARIABLES 


(u,v) so that the following properties hold: 
(0,v)=0 forveyY; 
(v,v) >0 forveV,v #0; 
(u,v) = (v,u) foru,vEeVŅY; 
(u +v, w) = (u, w) + (v, w) foru,v,wEV; 
a(u,v) = (au,v) forae€ Randu,ve VY. 


Then VY is called an inner product space. 
For v a member of an inner product space V, we define the norm of v by 


lell E Vo, 0). 


The distance between two members u and v of Y is ||v — u||. It can be checked 
that this distance is in fact a metric, since the defining properties of a metric 
hold: |v — ul] = |lu — v|| > 0 if v £ u, |v — v|| = 0, and 


llw — ul] < |lv — ull + |w — vl] 
Thus, an inner product space is a metric space with additional structure. 


Definition 1. An inner product space is called a Hilbert space if it is complete 
as a metric space. 


It is easy to show that every finite-dimensional inner product space is com- 
plete, and thus a Hilbert space. 

Suppose that V is a Hilbert space and U C VY. Then U is called a Hilbert 
subspace of V if U is itself a Hilbert space with respect to the vector operations 
and inner product inherited from V. 


Proposition 2. Let V be a Hilbert space and U a Hilbert subspace of V. Then 
for each x € V, there exists a unique z € U such that 


|z- z| = inf {IIx — yll: y € U}. 
Furthermore, z is the unique vector in U satisfying 
(y,z— 2) =0 
for ally € U. 
ProoF. Fix z € V. Find a sequence (zn) contained in U such that 
lz — zn|| > inf{||2 — 2||: z € U} 


as n — oo. We wish to prove that this sequence is Cauchy. Suppose that it were 
not. Then we may assume (by taking a subsequence if necessary) that there 
exists an € > 0 such that for all n, ||zn41 — 2n|| > €. Let yn = (zn + Zn41)/2. 
Then straightforward algebra with the inner product shows that 


(20.1) 2llzn — ||? + 2[lzng1 — zl? — Allyn — afl? = [lent — znl? > e’. 


20.2. THE HILBERT SPACE La(Q, F, P) 397 


On the other hand, in view of the defining property of the sequence (zn) and 
the fact that yn E U, the limit supremum of the left side of (20.1) is nonpositive. 
We have arrived at a contradiction, thereby showing that (zn) is Cauchy. Since 
U is complete, the sequence (zn) converges to a limit z € U. By the triangle 
inequality, 

z — 2|| < |la — zal] + |lzn — z| 


for all n. It follows that no member of YU is closer to z than z is, since ||z,—z|| > 0 
as n — oo. Thus, we have proved the existence of a member of U closest to z. 

Were there a second member 2’ of U as close to z as is z, then we could use 
(20.1) with zn = z and z,4) = 2’ to show that the average of z and z’ would 
be closer to x than z is, and we would have a contradiction. Therefore, z is the 
unique member of U that is, among members of U, closest to zx. 

For y = 0, the equality (y, z — x} = 0 is obvious. To obtain it for other y € U, 
consider the quantity 


(20.2) llay + z — z|? = a° |ly||? + 2a(y,z — x) + ||z — z||? , aER. 


Since ay + z € U, we know that the left side, and therefore the right side, of 
(20.2) has a unique minimum at a = 0. On the other hand, from elementary 
calculus (or standard facts about quadratic polynomials), we know that there is 
a unique minimum at a = —ly, z — x)/|ly||?. It follows that (y, z — x) = 0. 

Finally, we note that if u is any vector in U that satisfies (y, u — x} = 0 for all 
y E€ U, then 


(z —u,z — u) = (z-u,z-2+e-—4u) = (2-u,2z-a2)+(2-u,r-u)=0,7z 
since z € U. It follows that z =u. O 


The vector z, whose existence and uniqueness is asserted in the preceding 
proposition, is called the orthogonal projection of x onto U, and |x — z| is the 
distance between x and U. 


Problem 1. Let W bea Hilbert space, V a Hilbert subspace of W, and U a Hilbert 
subspace of V. Let x,y,z E W be such that y is the projection of x onto V and z 
is the projection of y onto U. Show that z is the projection of x onto U. 


20.2. The Hilbert space L: (Q, Ff, P) 


We set the goal of constructing a Hilbert space consisting of the R-valued random 
variables that have finite second moments on some probability space (Q, F, P). 


Problem 2. Prove that the sum of two R-valued random variables has finite second 
moment if each of the summands has finite second moment. Also, show that the 
product of a real number and a R-valued random variable X has finite second 
moment if X itself has finite second moment. 


398 20. SPACES OF RANDOM VARIABLES 


The preceding problem indicates that we can, in fact, form a vector space 
the members of which are random variables having finite second moments. The 
operations of multiplication by a scalar and addition are the usual operations 
with functions, and the appropriate commutative, associative, and distributive 
properties hold as consequences of the corresponding properties of real numbers. 
The same can be said for the other properties of vector spaces. 

We wish to turn the vector space of the preceding paragraph into an inner 
product space. However, before doing so, it is necessary to modify the situation 
slightly by considering two random variables to be equivalent if they are equal 
almost surely. Thus, rather than considering the space of random variables with 
finite second moments on some probability space, we instead consider the space 
of equivalence classes of such random variables. We leave it to the reader to check 
that the properties required of vector spaces remain valid after this modification. 

The preceding paragraph notwithstanding, we often speak of a random vari- 
able when we actually mean the equivalence class containing that random vari- 
able. For instance, in Definition 3 below, we define the inner product of two ran- 
dom variables X and Y with finite second moments to be the quantity E(XY). 
The following exercise asks the reader to prove several properties of the expec- 
tation that will show that this formula does indeed define an inner product. 


Problem 3. Prove or extract from previous work the following facts about real 
numbers a and b and random variables X, Y, and Z having finite second moments: 

(i) -œ < E(XY) < œ; 

Gi) E(XY) = E(Y X); 

(iii) E(X(aY +bZ)) =aE(XY)+bE(XZ]); 

(iv) E(XX) 20; 

(v) E(XX)=0 8 X =0as.; 

(vi) E(XY) = E(XZ)if Y = Z a.s. 


Thus, we may make the following definition. 


Definition 3. Let (N, F, P) denote a probability space. By 
Lo (F, P) 


we denote the inner product space consisting of all equivalence classes of R-valued 
random variables on (Q, F, P) that have finite second moments, with addition 
of vectors and multiplication by scalars defined in the usual way for functions, 
and the inner product of two members X and Y of Lə(N, F, P) defined by 


(X,Y) = E(XY). 


We sometimes write Lo for La (N, F, P) when there is no need to explicitly men- 
tion the underlying probability space (Q, F, P). 


20.2. THE HILBERT SPACE L2(Q, F, P) 399 


We call the norm associated with Lə the Ly-norm and denote it by 
IXI = (X, Xy. 
Thus, the distance between two members X and Y of Lg is given by 
d(X,Y) =||X - Yb. 
Proposition 4. L2(Q,F,P) is a Hilbert space. 


PROOF. All but completeness has been checked above. Let (X1, X2,...) be 
a Cauchy sequence in Lz. Choose a subsequence (Xn, : k = 1,2,...) such that 


| Mae O E 22>" 
for all k. By the Markov Inequality (Proposition 3 in Chapter 5), 
Pl Xing Anes oe) Se 2 
for all e > 0. By the Borel Lemma, 
P(lim sup{w: |X, (w) = Xm w)! 2 (2/3)*}) =0. 


It follows that, for almost all w, 
CO 
>a | Xn, (w) — X nry (w) |< OO. 
k=1 


Thus, for almost all w, the sequence (Xn, (w), k = 1,2,...) is a Cauchy sequence 
of real numbers. For such w, let X(w) be the limit of this sequence. Hence, X is 
an almost surely defined R-valued random variable. By the Fatou Lemma and 
the triangle inequality, 


fore) 
(E(X?) < (lim inf Xn Ila) < (a + ` Xn, = Xn) <œ, 
k=1 


so X € Lg. Similarly, 
oO 
|X = Xn, ll2 < ` [Xrm = Xamerll2 ors 
m=k 


It follows that X is the limit, in the metric space Lz, of the sequence (Xn, ). 
Since any subsequential limit of a Cauchy sequence is the limit of the entire 
sequence, X is also the limit of the sequence (X,,); and so Lz is complete. O 


Problem 4. Discuss the possibility of proving the preceding proposition by showing 
that the original sequence is Cauchy in probability rather than by showing that 
some subsequence is Cauchy a.s. 


* Problem 5. Let X € L2(0Q,F, P) and let U denote the subspace of L2(Q, F, P) 
consisting of all constant random variables. Find the orthogonal projection of X 
onto U. 


400 20. SPACES OF RANDOM VARIABLES 


We have already studied almost sure convergence and convergence in prob- 
ability for sequences of random variables (and also convergence in distribution, 
treated in Chapter 14). We now have another mode of convergence, namely con- 
vergence in Lə. With respect to the relationships between these various types of 
convergence, we already know that almost sure convergence implies convergence 
in probability. Here is more of the story. 


Proposition 5. Suppose that the sequence (Xn: n = 1,2,...) of R-valued 
random variables converges to X in Lə. Then the following are true: 


(t) ||Xnll2 => ||X|l2 as n 4 00; 

(ii) (Xn) converges to X in probability; 

(iii) E(|\X,|) > E(|X|) as n > œ; 

(iv) E(X — Xn|) > 0 asn > œ; 

(v) E(Xn) > E(X) asn > œ; 

(vi) there erists a subsequence of (Xn) which converges to X almost 
surely. 


* Problem 6. Prove the preceding proposition. 


Problem 7. Find a convergent sequence in L2 of some probability space that does 
not converge almost surely. 


Problem 8. Find a sequence of members of Lz of some probability space that 
converges almost surely to a member of L2, but which does not converge in Lo. 


Proposition 6. Let X and X,,n =1,2,..., be random variables on a prob- 
ability space (N), F, P). Suppose that Xn > X a.s. as n — œ and that, for each 
n, Xn € Le. Then the following conditions are equivalent: 

(i) the family {X2}: n = 1,2,...} is uniformly integrable; 
(ii) E(X?) < œ and limn+4. E((Xn — X)*) =0; 
(iii) mnno E(X?2) = E(X?) < œ. 


Problem 9. Prove the preceding proposition. 


It turns out that most of the results proved so far in this chapter for probability 
spaces extend to the o-finite setting. A few comments on how this is done are 
in order. First note that the Cauchy-Schwarz Inequality can be adapted to the 
general o-finite setting, and so the assertions in Problem 2 and Problem 3 apply 
in the o-finite setting. As for the proof of Proposition 4, the relevant part of the 
Borel-Cantelli Lemma is valid more generally, so Proposition 4 extends as well. 
The upshot is that we can speak of Lə of any o-finite measure space, and assert 
that it is a Hilbert space. 


20.3. THE METRIC SPACE Li (Q, F, P) 401 


Problem 10. Show that (i), (ii), and (vi) of Proposition 5 still hold if (Q, F, P) is 
replaced by an arbitrary o-finite measure space and convergence in probability is 
replaced by convergence in measure. Give an example that shows that (iii), (iv), 
and (v) may fail, where, of course, expected values are to be replaced by integrals. 


Proposition 5, parts (iii), (iv), and (v), and Proposition 6 do not generalize 
to o-finite measure spaces. 


20.3. The metric space L: (Q, 7, P) 


Before treating an application of L2, we introduce another metric space of equiv- 
alence classes of random variables. 

We denote by L1(Q,F,P), the space of all equivalence classes of R-valued 
random variables on (Q, F, P) having finite expectation. It is not a Hilbert space, 
but a metric space defined via the norm given by ||X||, = E(|X|), called the L,- 
norm of the member X of Lı (9, F, P). The distance between two members X 
and Y of Li (Q, F, P) is ||Y — X||1. When there is no danger of confusion we 
write Lı in lieu of Lı (Q, F, P). 


Problem 11. Prove the assertion just made that Lı is a metric space. 


Proposition 7. Let (Q,F,P) be a probability space. Then Lı (Q, F,P) isa 
complete metric space. 


Problem 12. Prove the preceding proposition. 


Problem 13. State and prove a proposition for L; analogous to Proposition 5. 


Notice that the Uniform Integrability Criterion, Theorem 12 in Chapter 8, 
contains an analog of Proposition 6 for L4. 


Proposition 8. For any probability space, Lə C Lı. Moreover, a sequence 
that converges in Lz also converges in Ly. 


Problem 14. Prove the preceding proposition. 


It is true that the L,-concept carries over to o-finite measure spaces, and 
that even in this more general setting, L; is a complete metric space. However, 
warnings for L, in the o-finite setting analogous to those for Lz are appropriate. 
Moreover, Proposition 8 is not valid in infinite measure spaces. 


402 20. SPACES OF RANDOM VARIABLES 


20.4. {| Best linear estimator 


Let X, Y1, Y2,..., Yn be R-valued random variables with finite second moments. 
The space 

V = {ao tarYi +--+ anYn: G0, 41,.--,An € R} 
is a finite-dimensional Hilbert subspace of L2(Q,F,P), called the Hilbert space 
span of {1,¥1,..., Yn}. Since the Hilbert space V is finite-dimensional we could 
also use the term linear span. Denote the orthogonal projection of X onto V by 
Z. By Proposition 2 we see that Z is the best linear estimator of X with respect 


to the sequence Y;...., Yn in the sense that 
(20.3) E((X — Z)*) < E(X -Y)?) 
for every linear combination Y of 1, ¥1,...,¥Y, that is not almost surely equal to 


Z. Because of the inequality (20.3), Z is also called the least squares estimate. 

We use the tools of linear algebra to do two things: show how to obtain an 
explicit formula for Z and prove that F(Z) = E(X), thereby indicating, in com- 
bination with (20.3), why Z is sometimes called the minimum variance unbiased 
linear estimator of X. An orthonormal basis {1, Vi, ..., Vm} for V can be ob- 
tained by using the Gram-Schmidt orthonormalization procedure, a procedure 
which only requires knowledge of the means, variances, and covariances of the 
random variables ¥|,...,¥,. Then 


Z = (X,1)1 +(X, Vi) V +e + (X, Vm} Vin - 
For i=1,...,n, E(V;) = (1, Vi) =0. Therefore E(Z) = (X,1) = E(X). 


* Problem 15. Let X,¥i,...,¥n and Z be as in the preceding example. Show that 
the random variable X — Z is uncorrelated with each of the random variables 
Viki sa Xr 


Problem 16. Let X, X1, X2,...,Xn be independent random variables in Lz and 
let Y; = X + X; fori =1,...,n. Find the best linear estimator of X with respect 
to Yi,...Yn. Hint: Use the cases n = 1,2 to guess at the general answer; then 
check your conjecture. 


One can interpret the random variables X; in the preceding problem as noise. 
Each time we try to observe X, we instead observe the sum of X and noise. From 
what we observe, we then try to find a linear combination of the observations 
that best estimates the true signal X. 


CHAPTER 21 
Conditional Probabilities 


In this chapter we introduce two closely related concepts: conditional probabil- 
ities and conditional probability distributions. These two mathematical objects 
are vital in the construction and analysis of nearly all of the random sequences 
and stochastic processes that form the chief subject matter of the latter part of 
this book (Chapters 24 through 33). 

We first show how to use Hilbert space ideas to construct conditional prob- 
abilities. The concept ‘conditional probability’ is then extended to ‘conditional 
distribution’ and the related ‘conditional density’. Conditional distributions do 
not always exist; a useful result concerning their existence is found in the fourth 
section. Next, conditional independence is treated. Using the notion of con- 
ditional independence, we also define an important class of random sequences 
known as ‘Markov sequences’. The chapter concludes with an illustrative exam- 
ple involving conditional distributions for normally distributed random variables. 


21.1. The construction of conditional probabilities 


Consider a simple symmetric random walk S in Z, with increments Xn = Sn — 
Sn—1. Let A be the event {w: S;(w) > 0}. From symmetry considerations we 
know that P(A) = 1/2. Our assessment of the probability changes, however, as 
we observe the values of S;,52,53, and S4. In particular, if we consider S4 we 
find that there are three different situations. If S4(w) > 0, then it is necessarily 
the case that S5(w) > 0. Similarly, if S4(w) < 0, then S5(w) < 0. Finally, if 
S4(w) = 0, then the sign of S5(w) is the same as that of the increment X5(w). 
We say that the ‘conditional probability of A, given S4’ is 1 if S4 > 0, 1/2 if 
S4 = 0, and 0 if S4 < 0. 

We have just described a special case of the following more general situation. 
We start with a probability space (Q, F, P). The probability measure P repre- 
sents our initial assessments of the probabilities of the various events in F. We 
are then given some information about which of the events in a certain collection 
G have occurred and which have not. The reader can use the Sierpiński Class 


404 21. CONDITIONAL PROBABILITIES 


Theorem to see that we might as well assume that G is a o-field. 

We think of G as representing information; if G = F, then G represents the 
information possessed by an observer who has seen the outcome of the complete 
experiment. If G equals the o-field {9,0}, then it represents the information 
possessed by one who knows only the probability space for the experiment but 
has no information about the outcome. Other o-fields G such that G C F 
represent partial information about the outcome of the experiment. Given such 
a o-field G and an event A € F, we wish to define the ‘conditional probability 
of A given G’, which represents our updated assessment of the probability of 
A, based on the information contained in G. The notation for this conditional 
probability is ‘P(A | G)’. It is not a constant, but a random variable. For the 
example in the preceding paragraph G = 0(S), S2, 53,54) and 


1 if S4(w) > 0 
P(A|G)(w) = 41/2 if Sy(w) =0 
0 if Sy(w) < 0. 


The same formula holds if G = o(S,). An interpretation of this fact is that 
S4, by itself, gives as much information about the sign of Ss as does the vector 
(S1, S2, $3, S4). 

We use the concept of orthogonal projection in a Hilbert space to define 
conditional probabilities. Let (Q, F, P) be a probability space and G be a sub-o- 
field of F. We wish to think of the Hilbert space Lə (Q, G, P) as a Hilbert subspace 
of the Hilbert space Lo (N, F, P). Strictly speaking, L2(Q, G, P) is not necessarily 
a subset of La (N, F, P), since the equivalence class of a random variable X in 
L (Q, G, P) will typically be smaller than the corresponding equivalence class of 
X in L2(0, F, P): a random variable in L2(Q,G, P) can be almost surely equal to 
many random variables in La (Q, F, P) that are not G-measurable. Nevertheless, 
if we identify the equivalence class of X in L2(Q,G, P) with the equivalence class 
of X in L2(0,F,P), then we may identify L2(Q,G,P) with a Hilbert subspace 
of L2(0,F, P). Hereafter we will make this identification without comment. 


Definition 1. Let A be an event in a probability space (Q, F, P) and let Gg 
denote a sub-o-field of F. The conditional probability of A given G, denoted 
by P(A | G), is the projection of I4 € L2(,F,P) onto L2(0,G,P). That 
is, P(A | G) is the equivalence class of G-measurable random variables X that 
satisfy the equation 

E(XY) = E(I4Y) 
for all Y € Lo(Q,G, P). 

In practice, it is cumbersome to work with equivalence classes, so we usually 

speak as if P(A | G) denotes a single, arbitrarily chosen G-measurable random 


variable in the equivalence class obtained by projecting [4 onto L2(0,G, P). 
Thus, statements of equality of conditional probabilities should technically have 


21.1. THE CONSTRUCTION OF CONDITIONAL PROBABILITIES 405 


the phrase ‘almost surely’ attached to them, but this phrase is often dropped 
and one speaks as if conditional probabilities are unique random variables rather 
than unique equivalence classes of random variables. 

The next proposition says that in testing whether a particular random variable 
equals P(A | G), one may focus on its relation to indicator functions of members 
of G. 


Proposition 2. Let A be an event in a probability space (Q, F, P), X an R- 
valued random variable, and G a sub-o-field of F. Then X = P(A | G) as. if 
and only if X satisfies the following two conditions: 


(i) X is G-measurable; 
(ii) for all B € G, E(XIp) = P(AN B). 


PROOF. The ‘only if’ direction of the result is immediate from the definition. 
For the ‘if’ direction, let X be any random variable satisfying conditions (i) and 
(ii). Then Z = X — P(A | G) is a G-measurable random variable for which 
E(XIg) = 0 for every B € G. It follows that P(B) = 0 if B is the set where 
Z is positive and also if B is the set where Z is negative. Thus, Z = 0 a.s., as 
desired. O 


The special case E(X) = P(A) of condition (ii) in Proposition 2 is a fact that 
is used often without reference to the proposition. 


Problem 1. Let P and G be as in Definition 1 and A € G. Give two proofs that 
P(A|G) = Ia, one using Definition 1 and the other Proposition 2. 


Problem 2. Let P and G be as in Definition 1, and H be a sub-o-field of G. Suppose 
that P(A |G) is H-measurable for some event A. Give two proofs that P(A |G) = 
P(A| H). 


Problem 1 can be generalized as follows. 


Proposition 3. Let A and B be events in a probability space (Q, F, P) and 
suppose that A is a member of a sub-o-field G of F. Then 


P(ANB|G)=P(B|G)I4 as. 


* Problem 3. Give two proofs of the preceding proposition, one using Definition 1 
and the other Proposition 2. 


Problem 4. Let A be an event in a probability space (Q, F, P) and suppose that 
A is independent of a sub-o-field G of F. By using Definition 1 and then again 
by using Proposition 2, prove that P(A | G) = P(A) and, in particular, that the 
probability of A given {Q, Ø} is the constant random variable P(A). 


406 21. CONDITIONAL PROBABILITIES 


In the following proposition, which is a generalization of the preceding prob- 
lem, conditioning with respect to a o-field a(G, H) is considered. But, for sim- 
plicity, G, H, rather than o(G,#) is written, there being no danger of confusion 
because conditioning is always with respect to a o-field. 


Proposition 4. Let G and H be sub-o-fields of the o-field F of events in a 
probability space (Q,F,P) and let A be an event. Suppose that o(A, H) and G 
are independent. Then P(A|G,H) = P(A | H) as. 


PROOF. The plan is to show that P(A | H) satisfies the two conditions given 
in Proposition 2 for a random variable to equal P(A | G, H). It obviously satisfies 
the first of the two conditions since P(A | H), being H-measurable, is o(G, H)- 
measurable. For the second condition we need to show that 


E(P(A|H)Ip) = P(AN B) 


for B € o(G,H). The collection of those B for which this relation holds is clearly 
a Sierpinski Class, so we may restrict our attention to B of the form CND, where 
C €G and DEH. We calculate 


E(P(A|H)Icnp) = E(P(A| H)IcIp) = E(P(AND | H)Ie), 


the last equality being a consequence of Proposition 3. Since G and o(A, H) are 
independent, P(ANM D | H) is H-measurable, and C € G, we may continue the 
calculation as follows: 


E(P(AND|H)Ic) = E(P(AN D | H)) E(Ic) 
= P(AND)P(C) = P((AND)NC) = P(AN(CND)), 


giving us the desired equality. O 


The intuition behind Proposition 4 is that if every event formed from A and H 
is independent of G, then, once A has been conditioned by H, further conditioning 
by G is irrelevant. It is important to note, as indicated by Problem 42, that the 
conditioning on G is not necessarily irrelevant if A and H are only separately 
independent of G. 

The next problem is very important for further developing one’s intuition. 


* Problem 5. Let A and C be events and suppose that 0 < P(C) < 1. Prove that 


P(ANC) P(AN C?) 
P(C) P(C*) 


(21.1) P(A|o(C)) = Ic + ce. 


With reference to the preceding problem: if w € C’, then 
_ P(ANC) 


this being the number that one would use in trying to decide whether a proposed 
bet on A is a good bet in light of the knowledge that C has occurred. We will 


21.1. THE CONSTRUCTION OF CONDITIONAL PROBABILITIES 407 


use the notation P(A | C) for the constant P(A N C)/P(C) (provided that 
P(C) > 0). Thus, if 0 < P(C) <1, 


P(A|C) ifwec 


P(A | o(C))(w) = a ifwgC. 


The number P(A | C°) is the number one would want to use for deciding about 
a bet on the event A given the information that C has not occurred. 

One might take the point of view that, in an actual running of a single experi- 
ment, one is interested in only one of the two numbers obtained in the preceding 
paragraph, not in both. In this connection consider a situation in which a unseen 
machine is flipping a fair coin five times. You are considering a bet that at least 
three flips have resulted in heads. As you are contemplating, someone who has 
seen the result tells you that the first two flips were heads. This information 
would seem to increase the odds of there being at least three heads, but that is 
not necessarily the case. For instance, he might have intended to tell you the 
outcomes of the first two flips only in case the last three flips were all tails. Un- 
less either there is a prior algorithm concerning information he will give you or 
one has actually incorporated his possibly random behavior into the probability 
space structure, one cannot calculate meaningful conditional probabilities. The 
mathematician’s resolution of this issue is to require all conditioning to be with 
respect to a o-field. The relevant o-field for Problem 5 is {C, C°, Q, Ø} which 
corresponds to the information: Č occurred or not. 

Problem 5 can be generalized. We call a o-field purely atomic if it is generated 
by a countable (possibly finite) partition (C,,C2,...) of the probability space. 


Problem 6. Let X = (X1, X2,...) be a random sequence of (Y, H)-valued random 
variables. Assume that W is a countable set. Prove that each of the o-fields 
a(X1, X2,..., Xn) is purely atomic. Is the o-field a(X) necessarily purely atomic? 


Proposition 5. Let G be a purely atomic o-field generated by a partition 
(Cj: j =1,2,...) of a probability space (Q, F, P), and let AG F. Then 


(21.2) P(A|G)= So P(A|C;)Ic, as. 


j: P(Cj)>0 


Problem 7. Prove the preceding proposition. 


Example 1. Let (Q, F, P) denote a fair-coin-flip space in which —1 is used 
to denote tails and +1 heads; thus w € Q can be written as (w1, w2,...) where 
each wn equals —1 or 1. For n = 0,1,2,..., let 


Sn(w) = wy +--- +n, 


408 21. CONDITIONAL PROBABILITIES 


where the empty sum So(w) equals 0. Then S = (So, S;,...) is a simple symmet- 
ric random walk on Z. We wish to calculate conditional probabilities of arbitrary 
events given G = o(S4). 

For k = —4, —2,0, 2,4, let 


Ck = {w = (wW) EN: ay + we tw +w =k}. 


Clearly {C_4,C_2,Co,C2,C4} is a partition of Q, and G is generated by this 
partition. Easy calculations give 


P(C) = È, 
P(C-2) = P(C) = $. 


1 
P(C_4) = P(C4) = FE. 
Thus, 
P(A |G) = 16P(ANC_a)lo_, +4P(AN C2)Ic., 
+ S P(A A Co)Ic, + 4P(AN Co)Io, + 16P(AN Ca)Io, - 


Let us use this formula for A = {w: S5(w) > 0}. Then ANC_4 = ANC_2 =f 
and CyUC, C A. The event AN Co consists of those members of Co whose fifth 
term is +1, and so its probability equals t P(Co). The upshot is 


1 


In particular, if w = (1,—1,—1,1,—1,—1,...) and @ = (1,—1,—1,1,1,—1,...), 
then i 

P(A|G)(w) = P(A| G)(@) = 5 
even though w ¢ A and 0 € A. This last sentence does not indicate any error. 
Rather, the point is that when we are conditioning on G and wish to evaluate 
a conditional probability at some sample point, we do not distinguish between 
two different sample points if they agree in the first four coordinates. 


It is often the case when writing expressions for conditional probabilities that 
two different symbols are needed for generic points in the sample space. For 
instance, we might write 


P({w’: Ss(w’) = 1} | G)(w). 


Of course, the roles of w and w’ could be reversed, or another letter could be 
introduced. 


* Problem 8. For S and G as in Example 1 compute: 
(i) P({w’: S7(w’) > 0} |G), 
(ii) Pw’: So(w’) = 2} |G), 


21.1. THE CONSTRUCTION OF CONDITIONAL PROBABILITIES 409 


(iii) P4: Se(4) = 2} | G), 

(iv) P({4: S2(Y) =0, Se(4) = 2} | G), 

(v) P({w": Sa(w’) =0, Se(w) = 2} |G). 
In particular, evaluate each of these random variables at w, where wı = w3 = 1 
and wn = —1 for all n Æ 1,3. 


* Problem 9. Let (Q, F, P) be as in Example 1. Obtain a formula for the conditional 
probability of an arbitrary event given H = o(S1,S2,53,54). Repeat Problem 8 
with G replaced by H. 


The following result indicates that conditional probabilities have much in 
common with (unconditional) probabilities. 


Proposition 6. Let G a sub-o-field of the o-field F of events in a probability 
space (Q,F,P). Then (i) P(Q | G) =1 a.s., (ii) 0 < P(A | G) <1 a.s. for each 
AEF, and (iii) 


p(Ü a o) =P |G) as 


for disjoint members A,, Ao,... of F. 


PROOF. To prove (i) we note that the constant function 1 is G-measurable. 
For any B € G, E(1/g) = P(B) = P(QNB). By Proposition 2, 1 is a conditional 
probability of Q given G. 

To prove (ii) we first let B = {w: P(A | G)(w) > 1} and, for a proof by 
contradiction, we suppose that P(B) > 0. Then 


E(P(A|G)Ip) > Ep) = P(B) > P(ANB) = E(P(A| GIs), 


the last step following from the second condition of Proposition 2. The contra- 
diction we have reached shows P(A | G) < 1 a.s.. A similar argument shows that 
P(A | G) > 0 a.s.. 

To prove (iii) we show that $` P(A, | G) satisfies the defining properties of 
P(UAn | G). It is certainly measurable with respect to G. To check the second 
condition of Proposition 2 we let B € G and use the Monotone Convergence 
Theorem in conjunction with the second condition of Proposition 2 for each 
E(An | G) to calculate 


E( (X P(An | 9))Ie) = X B(P(An | 9)Is) 
= YD P(A, N B) = P((U4] nB). E 


* Problem 10. [Conditional Borel Lemma] Prove that 


P(lim sup An | G)(w) = 


n= OO 


410 21. CONDITIONAL PROBABILITIES 


for almost every w € Q for which $°°"_, P(An | G)(w) < oo. Then conclude that 


P(lim sup An) < P({w: S P(A | G)(w) = oo} ) 


In the following example a conditional probability must be computed without 
the help of Proposition 5. The technique applied is that of first making an 
educated guess and then using Proposition 2 to verify that guess. 


Example 2. Let X be a random variable uniformly distributed on the inter- 
val [—1, 2]. Let G = o({|X|). We set the goal of calculating P(A | G), where 


A= {w: X(w) > 0}. 


We reason that if |X| > 1, we then know that X > 0. On the other hand, if 
|X| < 1, we feel that the sign of X is determined by a coin flip. Therefore, we 
conjecture 

PAV OG) = Y a.s., 
where 


1 
Y = hw: xw + gw: Xs: 


Condition (i) of Proposition 2 is clearly satisfied. For condition (ii), let B € G 
and write B = Bı U Bə, where 


Bı, = BA {w:|X(w)| >1} and Bz = BN {w:|X(w)|< 1}. 
Then 


E(Y Ip) = E(Y Ip,) + E(Y Ip.) = P(B1) + 5P(B») 


1 
= P(AN Bı) + 3PX € Co], 
where C2 is a certain Borel subset of [—1, 1] having the property that x € C2 if 


and only if —z € C2. Since 


i pjx € Co] = P[X € Co, X > 0] = P(AN Bə), 


we obtain 
E(Y Ig) = P(AN Bi) + P(AN Bo) = P(ANB), 


as desired. 


Problem 11. Replace the event A in the preceding example by 


{w: X(w) < 1/2}. 


21.1. THE CONSTRUCTION OF CONDITIONAL PROBABILITIES 411 


* Problem 12. Let Q denote the unit square, F the Borel o-field of subsets of Q, P 
two-dimensional Lebesgue measure, and G the o-field of sets of the form 


{(wi,w2) EN: w1 EC}, 
C a Borel subset of the unit interval. Calculate 


P({(w1, w2): Wy > we } | G) ; 


It is often the case that we wish to condition on a o-field G of the form 
G = o(X) for some random variable X. In this case conditional probabilities 
have a particularly nice form, as a consequence of the following lemma. 


Lemma 7. Let V be a V-valued random variable and Y a a(V)-measurable 
R-valued random variable. Then there exists a measurable function f: ¥ > R 
such that Y = foV. 


PROOF. Let C be the collection of functions Y : Q — R that can be written as 
foV for some measurable function f: Y > R. We first show that C contains all 
R-valued simple functions that are o(V)-measurable. Since such functions can be 
written as a finite sum ` , akla, for constants az € R and pairwise disjoint sets 
A, € a(V), it is enough to show that C contains all indicator functions of sets 
in o(V). If A is such a set, then A = V~!(B) for some B C VW, so I4 = IpoV. 

We next show that C is closed under increasing pointwise limits. Let (Y1 < 
Yə < ...) be an increasing sequence of members of C, and write Yn = fno V, 
for n = 1,2,..., where each fn is a measurable function from Y to R. Define 
gn = max{fi,...,fn} for n = 1,2,.... Each gn is measurable and Yn = gn ° V 
since the sequence (fn,n = 1,2,...) must be increasing on the image of V. It 
follows that lim, Yn = goV, where g = limn gn. Thus C is closed under increasing 
pointwise limits. By the preceding paragraph of this proof and Lemma 13 of 
Chapter 2, C contains all R’ -valued o(V )-measurable functions. 

Suppose now that Y is R-valued and o(V)-measurable. Then Y = Yt - Y7, 
where each of Y+ and Y- is R'-valued and o(V)-measurable. Thus there exist 
measurable functions g and h from WV to R” such that 


(21.3) Y=goV—hoV. 


Since this last expression is meaningful, the set D = {w: g(W) = h(w) = co} is 
disjoint from the image of V. Therefore, we may redefine h(w) = 0 for y € D 
and maintain the truth of (21.3). Then Y =(g—h)oV. O 


* Problem 13. Why near the end of the preceding proof did we not just redefine 
h(w) to equal 0 for y not in the image of V. 


412 21. CONDITIONAL PROBABILITIES 


Proposition 8. Let V be a (V,H)-valued random variable defined on a prob- 
ability space (NQ, F, P) and let A E€ F. Then there exists a (0, 1]-valued measurable 
function p on (¥, H) for which P(A | o(V))=poV as. 


Problem 14. Deduce this proposition from Lemma 7, being careful to say why p 
can be chosen to be (0, 1]-valued. 


The following notation will sometimes be used for the function p described in 
Proposition 8: 
P(A | V =v) = piv), 
a notation that gives rise to the intuitive phrase, “the probability of A given that 
V equals v”. 


Problem 15. For the setting of Example 2, find the function 
x~ P(A||X|= zx). 


Problem 16. For the setting of Problem 11, find the function 
x~ P(A| |X| = 2). 


* Problem 17. For the setting of Problem 12, find the function 
v ~ P({(wi,wa): wi > w2} | V = v), 


where V (w1, w2) = w1. 


21.2. Conditional distributions 


Conditional probabilities are [0, 1]-valued random variables. This property cor- 
responds to the fact that (unconditional) probabilities are [0, 1]-valued. Anal- 
ogously, we will define ‘conditional distributions’ so that they are distribution- 
valued random variables. The following discussion of notation is preparation for 
this point of view. 

Let (N), F) and (¥, H) be measurable spaces, and let Q denote the space of 
all probability measures on (¥, H). To say that Z is a Q-valued function with 
domain Q is to say that for each w, Z(w) is a probability measure. We could 
denote the probability given by Z(w) to a measurable set B by [Z(w)](B) but will 
not use this notation. Instead, in order to avoid the clutter of extra parentheses, 
we express this quantity as 

Z(w, B). 
When we want to refer to the [0, 1]-valued function w ~ Z(w, B), where B is a 
fixed measurable set, we will often use the notation 


Z(-, B). 


21.2. CONDITIONAL DISTRIBUTIONS 413 


When w is fixed, the probability measure B ~ Z(w, B) will sometimes be de- 
noted by Z(w,-). It is natural in many cases to suppress one of the arguments, 
writing 
Z(B) for Z(-,B) and Z(w) for Z(w,-). 

The practice of eliminating the w will be particularly prevalent when we introduce 
a special notation later for conditional distributions (see the discussion following 
Theorem 19). 

We now define the conditional distribution of a random variable X given a 
o-field G to be a certain distribution-valued function. 


Definition 9. Let X be a (W,#)-valued random variable defined on a prob- 
ability space (Q, F, P), G a sub-o-field of F, and Q the space of probability 
measures on (Y, H). A function Z: Q —> Q is a conditional distribution of X 
given G if for each fixed B € H, the [0, 1]-valued function Z(-, B) is a conditional 
probability of X~1(B) given G. 


It is natural for us to think of a conditional distribution Z as a ‘random 
distribution’, but technically speaking, in order to do so we need to make the 
target space Q into a measurable space. The following definition has been con- 
structed so that if Z is a conditional distribution given some o-field G, then Z 
is automatically a G-measurable Q-valued random variable. 


Definition 10. Let (¥,H) be a measurable space, and denote by Q the set 
of probability measures on (¥, H). The measurable space of probability measures 
on (¥,H) is (Q,), where 9 is the smallest o-field such that for each B € H 
the function from Q to [0,1] defined by Q ~ Q(B) is §-measurable. A random 
distribution on (W,H) is a measurable function from some probability space 


(Q,F,P) to (Q, 9). 


Most texts do not involve the measurable space (Q, ) in their treatment of 
conditional distributions. Nevertheless, we feel that coming to grips with Defini- 
tion 10 will provide the student with a deeper understanding of Definition 9, and 
so we have provided two problems to aid in that effort. Furthermore, it turns out 
that (Q,%) has some nice properties that will be useful later in the book, partic- 
ularly in our discussion of Markov sequences in Chapter 26 and Markov processes 
in Chapter 31. These properties are given in in Lemma 21 and Theorem 22 of 
the present chapter. 


Problem 18. Let (Q, F, P), (¥, H), and (Q,5) be as in the definition of random 
distribution. Prove that a function Z: (Q, F, P) > (Q,) is a random distribution 
if and only if Z(-, B) is a [0, 1]-valued random variable for each fixed B € H. 


Problem 19. Let Z be a random distribution on R. Show that the set 
[Z = ôs for some x € R] 


is measurable. 


414 21. CONDITIONAL PROBABILITIES 


Problem 20. Suppose that X is a random variable measurable with respect to a 
o-field G. Show that w ~> x(w) is a conditional distribution of X given G. 


Problem 21. Suppose that X is a random variable with distribution Q. Show that 
X is independent of a o-field G if and only if the constant random distribution 
Z = Q is a conditional distribution of X given G. 


Here are some results for using a conditional distribution of one random vari- 
able to find conditional distributions of other related random variables. 


Proposition 11. Let Z denote a conditional distribution of some (WV, H)- 
valued random variable X given aa-field G, and denote by h a measurable func- 
tion from (V,H) to a measurable space (=,K). Then 
(21.4) C ~ Z, kC), CEK, 
is a conditional distribution of the random variable ho X given G. 


PROOF. That C ~ Z(w,h7'(C)), C € K, is a probability measure for each 
w follows from the fact that h is measurable and Z takes values in the space 
of probability measures on H. The measurability of the function (21.4) is a 
consequence of the measurability of the functions h and B ~ Z(-,B), BEH. 

To check that Z(-,h~!(C)) is a conditional probability given G of (ho X)~+(C) 
for C € K, we calculate E(Z(-,h71(C)) Ia) for A € G. Since Z itself is a 
conditional distribution of X given G, the definition implies that Z(-,h~1(C)) is 
a conditional probability of h~'(C) given G, so 


E(Z(-,h“™(C)) I4) = P(X € AC) A), 
which equals P([ho X € C] NA A), as desired. O 


Proposition 12. Let G be a oa-field and, for i = 1,2, let X; be a (Wj, Hi)- 
valued random variable. Suppose that X, is G-measurable and that a conditional 
distribution Z of Xa given G exists. Then 


Ox, xZ 
is a conditional distribution of the ordered pair (X1, X2) given G. 
The following is a consequence of the preceding two propositions. 


Corollary 13. In the setting of Proposition 12, let h: (Y1, H1) x (V2, H2) > 
(=, K) be a measurable function, and for xı € Pı, let Zz, be the random distri- 
bution induced from Z by the function zə ~ h(z1, £2). That is, 


Zz, (C) = Z(„{z2: h(z1,22) EC}, CEK. 
Then Zy, is a conditional distribution of h o (X1, X2) given G. 


Problem 22. Prove Proposition 12 and Corollary 13. 


21.2. CONDITIONAL DISTRIBUTIONS 415 


Problem 23. Let X and Y be R' -valued random variables and suppose that a 
conditional distribution of Y given o(X) exists. In terms of this conditional dis- 
tribution, find a conditional distribution of X +Y given o(X). 


* Problem 24. Let X and Y be independent R-valued random variables. In terms 
of the distribution of Y, find a conditional distribution of X AY given o(X). 


Example 3. Let (So = 0, S1, S2,...) be a random walk in RÊ with distri- 
bution Q, and fix a nonnegative integer n. It seems intuitively clear that given 
o(S,,), a conditional distribution of (Sp,Sn41,---) is Qs,, where for each s € R¢, 
Qs, is the distribution of the random sequence (s + So9,s + 51,85 + So,...). Let 
us examine how this follows from previous results. 

Note that each Q, is the distribution induced from Q by the function 


(Sp, $1, 82,...) > A(s, (S0, $1, $2,---)) d (s + 80,8 + 1,8 + S9,...). 
Denoting the steps of the random walk by X1, X2,..., 
(on Onis ones . z = h(Sn, (0, Xn+1, Xn+1 T Xn42; oe Bo : 


Since the sequence (0, Xn41, Xn+1 + Xn+2,---) has (unconditional) distribution 
Q and is independent of a(S,), its conditional distribution given a(S,) is also 
Q, by Problem 21. The desired conclusion follows from Corollary 13. 


A good technique for some of the following problems is to guess a correct 
answer and then verify its correctness. 


* Problem 25. Let X be an R-valued random variable. Assume that the distribution 
of X has a density f with respect to a o-finite measure u on R, where yp satisfies 
u(B) = u(—B) for all Borel sets B. Find a formula for a conditional distribution 
of X given o(|X}). 


Problem 26. Apply the preceding problem to X uniformly distributed on [—1, 2]. 


Problem 27. Let Xi,...,Xn be iid R-valued random variables and denote the 
corresponding order statistics by Y1, Y2,..., Yn. Find a conditional distribution of 
X= (Xi, X2, nener An) given a(Y¥1, Yo, Sheed wk oe): 


Problem 28. Let X be any (W,#)-valued random variable, and find, in terms of 
the distribution of X, a conditional distribution of X given o({w: X(w) € B}), 
where B is a fixed member of H. 


Problem 29. Let X = (X1,...,Xa) be a random vector, the coordinates of which 
are independent standard gamma random variables with parameters 71,..., Yd, 
respectively. Find a conditional distribution of X given o(Xı + X2 +--+ Xa). 
Discuss relations with Problem 34 of this chapter and Example 2, Problem 28, and 
Problem 30 of Chapter 10. 


416 21. CONDITIONAL PROBABILITIES 


21.3. Conditional densities 


Just as (unconditional) distributions are often most easily studied via densities, 
so conditional distributions are often expressed in terms of their densities, when 
they exist. 


Definition 14. Let X be a (W, H)-valued random variable defined on a prob- 
ability space (Q, F, P), p ao-finite measure on (VW, H), and G a sub-o-field of F. 
A nonnegative measurable function q on (Q x Y, F x H) is called a conditional 
density of X with respect to u given G if 


Bo | a(-,2)u(dr) 
B 
is a conditional distribution of X given G. 


Sometimes the o-finite measure p in the preceding definition is not mentioned, 
especially if it is Lebesgue measure or counting measure. 


* Problem 30. Let X be exponentially distributed with mean à. Let G be the o-field 
generated by the event {w: X(w) >t}, where t is a fixed nonnegative real number. 
Find a conditional density of X given G. 


If two random variables have a joint density with respect to a product measure, 
then there is a nice formula for a conditional density of one of the two random 
variables, given the o-field generated by the other one. 


Proposition 15. For i = 1,2, let X; be (V;,H,;)-valued random variables 
defined on a common probability space (Q, F, P). Assume that the distribution 
of the random vector (X1, X2) has a density f with respect to a o-finite product 
measure u = [ty X u2 on H = Hı x He. Let 

g(z1) = | f(t, z2) pe (dre). 
Wo 
Then the function 
f(Xi(w), z2) 
<a if g(X1(w)) > 0 
Gs Te) = 
Su, (£1, £2) p1 (dz1) if g(Xı(w)) =0 


is a conditional density of Xo with respect to u> given o(Xı). 


Problem 31. Prove the preceding proposition. 


Problem 32. Let (X,Y) be an R?-valued normally distributed random vector with 
mean (0,0) and a covariance matrix I having positive eigenvalues. Find a condi- 
tional density of Y given o(X). Identify the corresponding conditional distribution 
by name and give a formula for the appropriate ‘conditional parameters’ (mean and 
variance). 


21.4. EXISTENCE AND UNIQUENESS OF CONDITIONAL DISTRIBUTIONS 417 


Problem 33. Discuss the cases in which the covariance matrix in the preceding 
problem has one or two eigenvalues equal to 0. 


* Problem 34. Let X1, X2,...,Xq be independent positive random variables having 
gamma densities 


AES ee 
git e”, a2 >O, 


T ~ 


P(7:) 
for i = 1,2,...,d. Find an expression for the density of the random vector 


d 
CG XG 4a Mig A Xi): 
k=1 


Also, find a conditional density of (X1, X2,...,Xa-1) given o(X1+X2+---+Xa). 


Problem 35. Let X1,...,Xn be iid R-valued random variables, with common den- 
sity f. Show that for each k, the conditional density of X, given a(S), where 
S = Xi +--+ Xn, is the function 


FOE (Sw) - z) 
f*"(S(w)) 


W ~> 


21.4. Existence and uniqueness of conditional distributions 


The next proposition says that conditional distributions of R-valued random vari- 
ables always exist regardless of the probability space on which they are defined 
and regardless of the conditioning o-field. 


Proposition 16. Let X be an R-valued random variable defined on a proba- 
bility space (Q, F, P) and let G be a sub-o-field of F. Then a conditional distri- 
bution Z of X given G exists. Moreover, Z is unique in the sense that if Z' is 
any other conditional distribution of X given G, then Z = Z' a.s. 


PARTIAL PROOF. For each rational x € R, set 
F(w, 2) = P({w": X(w') < 2} | G)(w). 


We leave it as an exercise to show that there exists a single null event N such 
that for w € N°, F(w,-) is increasing and right continuous as a function on the 
rationals, and has limits in [0,1] at —co and at oo. Thus, for each w € N°, 
the domain of F'(w,-) can be extended uniquely to R so that it is a distribution 
function for R. Let Z(w) be the corresponding probability measure on R. We 
take care of w € N by letting Z(w) equal the (unconditional) distribution of 
X for such w. We will show that Z is a conditional distribution of X given G. 
By construction, Z is a probability measure for each w. To finish the proof of 
existence, we only need show that, for each Borel set B, Z(-, B) is a conditional 
probability of X~'(B) given G. 

It is straightforward to check that the class of those B for which Z(-, B) 
is a conditional probability of X—!(B) given G is a Sierpinski class and that it 


418 21. CONDITIONAL PROBABILITIES 


contains the interval [—0o, oo] and all intervals of the form [—o0, x] for x rational. 
An appeal to the Sierpinski Class Theorem completes the existence proof. 
The proof of uniqueness is left to the reader. O 


Problem 36. Complete the proof of the preceding proposition by: (i) showing the 
existence of an event N with the claimed properties, (ii) supplying the details con- 
nected with the application of the Sierpinski Class Theorem, and (iii) proving the 
uniqueness assertion. Hint: Use Proposition 6 and, for uniqueness, the observation 
that Definition 1 entails uniqueness for each fixed z. 


In order to generalize Proposition 16, we introduce two definitions. 


Definition 17. Two measurable spaces are called isomorphic if there exists 
a bijective function y between them such that both y and y~! are measurable. 


Definition 18. A measurable space is called a Borel space if it is isomorphic 
to some (A, B(A)), where A is a Borel set in [0, 1] and B(A) is the o-field of Borel 
subsets of A. 


Theorem 19. [Existence and Uniqueness of Conditional Distributions] De- 
note by (V,H) a Borel space, by (N, F, P) a probability space, and by G a sub-o- 
field of F. Then every (V,H)-valued random variable defined on (Q,F,P) has 
a conditional distribution given G. Moreover, such conditional distributions are 
unique in the same sense as in Proposition 16. 


Problem 37. Use Proposition 16 to prove the preceding theorem. 


In view of the uniqueness portion of the preceding theorem we often speak, in 
the Borel-space-setting, of ‘the’ conditional distribution For a random variable 
X having (unconditional) distribution Q on some Borel space, we use the special 
notation 

Q(- |G) 
to denote the conditional distribution of X given G. Of course, like other random 
distributions, this function takes two arguments: a measurable set B and a 
sample point w, so that notation like Q(- | G)(w), Q(B | G)(-), and Q(B | G), as 
well as the more explicit 


w~ [B ~ Q(B | G)(w)], 


will sometimes be used to reflect different points of view. In those cases where 
G = o(Y ) for some random variable Y, we will often write 


QC|Y) for Q(-|o(Y)). 


When the target space of a random variable X is different from its domain 
Q, the notation introduced in the preceding paragraph is distinct from that 


21.4. EXISTENCE AND UNIQUENESS OF CONDITIONAL DISTRIBUTIONS 419 


introduced earlier for conditional probabilities, in spite of the similarities. Even 
when the target of X is Q, there is no danger of confusion as long as Q is 
different from the underlying probability measure P. However, from time to 
time, it will be convenient to let X be the identity function on (Q, F, P). The 
distribution of this special random variable is P, so when Q is itself a Borel 
space, P(- | G) will denote the conditional distribution of X given G. One might 
worry that the expression P(A | G) now denotes two different things: (i) the 
conditional probability given G of the event A, and (ii) the value given to A 
by the conditional distribution of X given G. But the definitions ensure that 
these two quantities agree almost surely for any fixed event A, so there is no real 
conflict. 

It is fortunate that many of the measurable spaces encountered in probability 
theory are Borel spaces. The next proposition gives a good idea of the prevalence 
of Borel spaces. The term ‘Polish space’ used here is defined in Chapter 18; for 
those who have not read that chapter, it suffices to mention that R, R, Rt, R 
(with their usual topologies), and all countable sets can be regarded as Polish 
spaces. 


Proposition 20. Every Polish space is a Borel space. A product of a finite 
or countable number of Borel spaces is itself a Borel space. Every measurable 
subset A of a Borel space B is itself a Borel space, the measurable subsets in A 
being those subsets of A that are measurable in B. 


PROOF. The statement that Borel subsets of Borel spaces are themselves 
Borel spaces is obvious. It is easy to see from the definition of Borel space 
that any finite or countable product of Borel spaces is isomorphic to a Borel 
subset of the infinite-dimensional cube [0,1], and it follows from Lemma 4 of 
Chapter 18 that any Polish space is also isomorphic to a Borel subset of the 
infinite-dimensional cube. Thus, to complete the proof, it is sufficient to prove 
that [0,1]°° is a Borel space. 

By using binary representations of the coordinates of points x = (z1, 22,...) € 
[0,1] (being careful as usual about the ambiguous cases), we can associate to 
each x a unique doubly indexed sequence of 0’s and 1’s. A simple relabeling of 
indices allows us to associate to each x a unique (singly indexed) sequence of 0’s 
and 1’s. We leave it to the reader to check that all of this can be done in such 
a way as to create an isomorphism between the infinite-dimensional unit cube 
[0, 1]°° and a Borel subset of the space {0,1} of all sequences of 0’s and 1’s. 
Thus, it is sufficient to show that {0,1}° is a Borel space. This task is easily 
accomplished by appropriate use of the binary representation. O 


One consequence of Proposition 20 is that conditional distributions of infinite 
sequences of R- or R-valued random variables always exist, not just the condi- 
tional distributions of the individual random terms. This fact is useful for the 
study of random sequences. 


420 21. CONDITIONAL PROBABILITIES 


Recall from Proposition 8 that a conditional probability given o(V), where 
V is a random variable, can be written as the composition of V and a measur- 
able [0, 1]-valued function on the target of V. It is natural to ask if there is a 
corresponding theorem for conditional distributions. Several exercises, including 
Problem 23, Problem 25, and Problem 27, indicate that the answer may be ‘yes’. 
If we knew that the distributions of interest formed a Borel space, then we could 
apply Lemma 7 just as we did to obtain Proposition 8. 


Lemma 21. The measurable space of probability measures on a Borel space 
is a Borel space. 


Before proving this lemma we record its important consequence, referred to 
in the discussion preceding the lemma. 


Theorem 22. Let X: (Q,F,P) > (V,G) and Y: (Q,F7,P) > (0,H) be 
random variables defined on a common probability space, and assume that (W, G) 
is a Borel space. Then the conditional distribution of X given o(Y) can be 
written in the form RoY, where R is a measurable function from (©, H) to the 
measurable space of probability measures on (WV,G). 


The function R of the preceding theorem is distribution-valued, so our nota- 
tional conventions dictate that we write R(y, B) for the measure given to the set 
B € G by R when it is evaluated at y € ©. There is a special notation for this 
quantity: 

PIXeB\|Y=y), 


with corresponding expressions like P[X €-|Y = y] and P.X € B|Y =] 
also being used. Do not confuse this notation with expressions like P[X € - | Y], 
introduced earlier in conjunction with the conditional distribution of X given 
o(Y). The relationship between P[X € - | Y] and P[X € -|Y = y] is described 
by Theorem 22. 

It remains for us to prove Lemma 21. Those readers who have skipped Chap- 
ter 18 may want to omit this proof. 


PROOF OF Lemma 21. Let p denote the usual metric on [0,1], the distance 
between two points being the absolute value of their difference. The space 
([0, 1], p) is a Polish space. We also call this space ([0, 1], 8), with B denoting 
the o-field of Borel subsets of [0,1], when we want to identify it as a measur- 
able space rather than as a metric space. By Theorem 24 and Theorem 25 of 
Chapter 18 the set of probability measures on ([0,1],) can be turned into a 
metric space with a metric f in such a way that convergence in f corresponds to 
the standard definition of convergence of sequences of probability measures on a 
Polish space. Here is a list of the measurable spaces to be used in this proof: 


e a Borel space (W,G); 
e the measurable space (Q, ©) of probability measures on (Y, G); 
e the Polish space ((0, 1], p); 


21.4. EXISTENCE AND UNIQUENESS OF CONDITIONAL DISTRIBUTIONS 421 


the measurable space (R, B) of probability measures on ([0, 1], p); 
the Polish space (R, 6) of probability measures on ([0, 1], p); 

the measurable space (A, A) for A a certain Borel subset of ((0, 1], B); 
the measurable space (R.4, 8) of probability measures on (A, A). 


Our goal is to prove that (Q,6) is a Borel space. 

The first step is to prove that (7,8) is a Borel space. We will accomplish 
this by showing that, when regarded as measurable spaces, (R, 6) and (R, B) 
are the same. Since (RR, /) is a Polish space, the desired conclusion in this first 
step then follows from Proposition 20. 

We know that % is the o-field generated by functions of the form R ~ R(B) 
for Borel subsets B C [0,1]. Let us prove that such a function is measurable 
when its domain is taken to be the Polish space (R, ĝ). If B is compact, its 
indicator function Ig is the limit of a decreasing sequence (fn) of continuous 
bounded functions. Since each function R ~ f fn dR is continuous and therefore 
measurable, the function 


(21.5) R~ I Ip dR = R(B) 


is measurable. The collection of compact sets is closed under finite intersection 
and the collection of sets for which (21.5) holds is easily seen to be a Sierpiński 
class. Therefore, the function (21.5) is measurable for every Borel set B. For the 
other direction we need to prove that B contains all open sets, and therefore all 
Borel sets, in the Polish space (R, 6). The open sets are inverse images of open 
sets via functions of the form R ~ f f dR, where f is continuous and bounded. 
To finish this portion of the proof we only need show that such a function is 
measurable when its domain is regarded as (R, 8B). But this is easy, since f can 
be approximated by simple functions and R ~» f gdR for g simple is a linear 
combination of functions of the form (21.5). 

Our next task is to prove that (74,84) is a Borel space for every Borel set 
A C [0,1] and thus, in particular, for the yet-to-be-made choice of A. We will do 
this by proving that B 4 consists of exactly those members of B that are subsets 
of Ra (and thus, in particular, that Ra € B). We know that B4 is generated 
by sets of the form 


{RE Ra: R(BJEC}={RER: R(BNECIN{RER: R(A)=1} EB, 


where C is Borel set in [0,1]. Hence, every member of B4 is a member of B. 
For the opposite direction we will prove that every member of % is the union of 
three sets: a member of 84, a member of B 4°, and a set of the form 


1 1 
—— Rl, E S1, ——~Rl ac € So}, 
Ca ac: 
where Sı E€ B4, S2 E€ Bac, and R|4 and R|,- denote R restricted to A and to 
A‘, respectively. This is the case because a collection of sets generating B is 
included in this list and the sets in this list constitute a o-field. But the sets in 


{RER:0< R(A) <1, 


422 21. CONDITIONAL PROBABILITIES 


this list that are subsets of R4 are clearly members of B4. We have shown that 
(R.4,B,) is a Borel space. 

Since (W,G) is a Borel space there is a one-to-one function y from Ẹ onto 
a Borel subset A of [0,1] that induces a one-to-one correspondence between G 
and A. We will show that y induces a one-to-one function gy from Q to RA 
which itself induces a one-to-one correspondence between © and B4; then the 
lemma follows because (R4,84) is known to be a Borel space, by the preceding 
paragraph. For Q E Q, define (Q) by 


(2(Q))(B) = Q(~*(B)) 


for B a Borel subset of A. Clearly is one-to-one because y is, and also the 
image of is Ra. Fix a measurable subset D of W and the corresponding 
Borel subset B = y(D) of A. Then consider the functions g and h defined by 
g(Q) = Q(D) on Q and A(R) = R(B) on Ra. Since g = hog and both ¢~ and 
(+ are measurable, ô induces a one-to-one correspondence between 6 and B4, 
as desired. O 


21.5. Conditional independence 
We begin with the following natural definition. 


Definition 23. Let (Q, F, P) be a probability space and Fi, F2, and G sub- 
o-fields of F. The o-fields F,; and F> are conditionally independent given G 
if 

P(A, N Ag | G) = P(A, | G) P(A2 | G) as 
for all A; € Fi, t = 1,2. 


We leave it to the reader to extend the notion of conditional independence to 
to more than two o-fields and to obtain conditional analogues of Proposition 3 
and Proposition 4, both of Chapter 9 . 

There are many results that show how conditional independence can be used 
in ways that parallel uses of (unconditional) independence. The first and third 
problems below describe two such results. It is left to the interested reader to 
work out others. 


Problem 38. [Conditional Borel-Cantelli Lemma] Let (Q, F, P) be a probability 
space, G a sub-o-field of F, and (An: n = 1,2,...) a sequence of events condition- 
ally independent given G. Prove that for almost all w € Q, 


TRL OO 0 


P(lim sup An | G)(w) |- 


according as 


= CO 
do P(An | 9) eS 


n=1 


21.5. CONDITIONAL INDEPENDENCE 423 


Then conclude that 


jo) 


P(lim sup An) = P({w: X P(An | G)(w) = oo} ) 


n -> OO n=l 
Problem 39. Make a direct confirmation of the result in Problem 38 for the special 
case where 


GS 6( An n=l 2 e): 


Problem 40. Let X and Y be two random variables that are conditionally inde- 
pendent given a o-field G and that have targets that are Borel spaces. Prove that 
the conditional distribution of the ordered pair (X,Y) is the product measure of 
the conditional distribution of X and that of Y. 


Notice that the preceding problem contains information about random prod- 
uct measures, whereas the definition of conditional independence only involves 
products of numbers. 

The following proposition shows that conditioning can turn dependence into 
independence. 


Proposition 24. Let (Q,7,P) be a probability space. Let G and Go and H 
be sub-o-fields of F and suppose that Go C H. Then G, and Go are conditionally 
independent given H. 


Problem 41. Prove this proposition. Hint: Use Proposition 3 and Problem 1. 


The next problem shows that conditioning can turn independence into depen- 
dence. 


Problem 42. Consider the experiment of flipping two fair coins. Let A be the event 
that the first coin comes up heads, B the event that the second coin comes up heads, 
and C the event that the two coins agree. Show that any two of the events A, B, 
and C are independent, but that they are not conditionally independent given the 
o-field generated by the remaining event. Explain the connection between this 
example and the discussion following the proof of Proposition 4. 


The preceding problem shows that the interplay between conditioning and 
independence involving more than two o-fields can be quite interesting. The 
same can be said of the following propositions, problems, and corollary. 


Problem 43. For each proposition, problem, and corollary between here and The- 
orem 29, intuitively describe the conclusion. 


424 21. CONDITIONAL PROBABILITIES 


Proposition 25. LetG, H, and K be o-fields of events in a probability space. 
If G and H are conditionally independent given K, then G and o(H,K) are 
conditionally independent given K. 


PROOF. Denote the underlying probability space by (Q, F, P). We need to 

show 
P(ANB|K)=P(A|K)P(B|K) 
for arbitrary A € G and B € o(H,X). For fixed A, Proposition 6 implies that the 
collection of B for which this equality holds is a Sierpiński class. Thus, we may 
restrict our attention to B of the form CN D, where C € H and D € K. For such 
B, the following calculation that first uses Proposition 3, then the conditional 
independence of G and H given XK, and finally Proposition 3 a second time 
completes the proof: 
P(A|K)P(CND|K)=P(A|K)P(C | K)Ip 
=P(ANC|K)Ip=P(AN(CND)|K). O 
Proposition 26. Let G and H be two a-fields of events in a probability space 


and let G; and Hı be sub-o-fields of G and H, respectively. Suppose that G and H 
are independent. Then G and H are conditionally independent given o(Gi, Hı). 


PROOF. Let A € G and BE H. Clearly, 
P(A | G1, Hı)P(B | G1, Hı) 


is o (G1, Hı )-measurable. To show that it satisfies the second condition in Propo- 
sition 2 characterizing P(ANB | G1, Hı), we first simplify it using Proposition 4: 


P(A | G1)P(B | H1). 
Consider a member C N D of o(G1, Hı), where C € Gı and D € Hı. Then 
E(P(A | G1)P(B | Hi )Ic Ip) 


is the expectation of the product of a bounded G-measurable random variable 
and a bounded H-measurable random variable. Therefore it equals 


E(P(A| G:)Ic)E(P(B | H1)Ip) = P(ANC)P(BND) = P(ANB)N(CND)). 


We complete the proof by using the Sierpiriski Class Theorem to assert that the 
equality so far obtained, namely 


E(P(A | G1)P(B | Hi)lonp) = P(AN B)N(CND)), 


holds with C N D replaced by an arbitrary member of 0(G1,#1). O 


* Problem 44. Show that o(Gi, Hı) in the preceding proposition cannot be replaced 
by an arbitrary sub-o-field of o (G, H). 


21.5. CONDITIONAL INDEPENDENCE 425 


Proposition 27. LetG CH and K be three o-fields of events in a probability 
space. Then H and K are conditionally independent given G if and only if, for 
every CEK, 


P(C |G) = P(C | H) as. 


PROOF. Denote the probability space by (Q, F, P). Suppose that H and K 
are conditionally independent given G and let C € K. We check that P(C | G) 
satisfies the two conditions in Proposition 2 characterizing P(C | H). Since 
G C H, P(C | G) is H-measurable. To verify condition (ii) of Proposition 2, 
we let B € H and use Definition 1 and the fact that P(C | G) € L2 (Q, G, P) to 
obtain 

E(P(C | G) IB) = E(P(C |G) P(B|G)). 
From the conditional independence of H and K we deduce 


E(P(C | G)P(B | G)) = E(P(CN B | G)) = P(C A B). 


From the preceding two calculations we see that P(C | G) satisfies the second of 
the two characterizing conditions of P(C | H). 

For the converse we assume P(C | G) = P(C | H) for all C € K. Then we fix 
C € K and B EH and set 


Y (w) = P(B|G)(w) P(C | G)(w). 


We complete the proof by using Definition 1 to show that Y = P(BNC |G) as. 
It is clear that Y € L (Q,G, P). Let Z € L2(0,G, P). Then 


E((Y — IgsIc)Z) = E((P(B | G) — IB)P(C | G)Z) + E((P(C | G) — Ic)IpZ) 
=0+ E((P(C | H) -Ic)IpZ) =0, 


since P(C | G)Z is G-measurable and Ig Z is H-measurable. O 


Corollary 28. Let G C H and K be three o-fields of events in a probabil- 
ity space. Suppose that H and K are independent. Then H and o(G,K) are 
conditionally independent given G. 


PROOF. We apply Proposition 27 with G, H, and o(G,K) of the corollary 
playing the roles of G, H, and K, respectively, in that proposition. According to 
the proposition we only need show P(C | G) = P(C | H) for every C € a(G,K). 
By Proposition 6, the collection of those C for which this equality holds is a 
Sierpiński class, so it suffices to prove that P(DN E | G) = P(DNE | H) 
whenever D € G and E € H. Since D € G and, therefore, also D € H, 


(21.6) P(DNE|G) = IpP(E|G); 
(21.7) P(DNE|H) =IpP(E|H). 


426 21. CONDITIONAL PROBABILITIES 


To complete the proof that the right sides and therefore the left sides of (21.6) 
and (21.7) equal each other, we need only observe that since H and K are inde- 
pendent, it follows that G and K are independent and, therefore, 


P(E | G) = P(E) = P(E | H) a.s. O 


Problem 45. In view of Proposition 27 one is tempted to make a conjecture that 
treats H and K symmetrically. Suppose that, besides G, H, and K as given in that 
proposition, a fourth o-field £ C K is involved. Then one might conjecture that if 


P(C | G) = P(C | H) as. 


for every C € K and 
P(B | £) = P(B | K) as. 


for every B € H, then H and K are conditionally independent given o(G, £). Use 
the probability space for some fair coins to show that this conjecture is not true. 


We now have a tool to prove that the past and future of a random walk are 
conditionally independent given the present. 


Theorem 29. Let (So = 0, S1,...) be a random walk. Then, for every n, the 
vector (So, S1,-.-, Sn) and the sequence (Sn, Sn+1,.-.) are conditionally inde- 
pendent given o(Sn). 


PROOF. Denote the steps of the random walk by X1, X2,..., and set 
G =o Nri H = 0(X1, X2,..., Xn), K =al Oe ae, Cae, ee 


It is clear that G, H, and K satisfy the hypotheses of Corollary 28. From 
that result we conclude that H and o(G,XK) are conditionally independent given 
G. This conclusion and the observations that (So,...,5,) is #-measurable and 
(Sn, Sn41,---) is o(G, K)-measurable completes the proof. O 


The preceding theorem gives another illustration (besides the rather trivial 
Proposition 24) that conditional independence does not imply independence. 
The property obtained for random walks in the theorem is quite important. 
Many interesting random sequences, other than random walks, have this prop- 
erty. 


Definition 30. A random sequence (Ym: m > 0) adapted to a filtration 
(Fm:m > 0) is a Markov sequence with respect to that filtration if, for each 
nonnegative integer n, Fn and o(Ym: Mm > n) are conditionally independent 
given o(Y,,). 


21.6. CONDITIONAL DISTRIBUTIONS OF NORMAL RANDOM VECTORS 427 


For the preceding definition, the minimal filtration is implicit if no filtration 
is mentioned explicitly. According to Theorem 29, a random walk is a Markov 
sequence. Regarding n in the definition as indicating the present, a Markov 
sequence is one in which the past (represented by Fn) and the future (represented 
by o(Y¥in:m > n)) are conditionally independent given the present. The next 
proposition describes an equivalent view: a Markov sequence is one in which the 
conditional probability of a future event given the past equals its conditional 
probability given the present. 


Proposition 31. Let Y = (Ym: m > 0) be a random sequence adapted to a 
filtration (Fm: m > 0). Then the following two conditions are equivalent: 
(i) Y is Markov with respect to (Fn: n > 0); 
(ii) for each n and each C € a(Yn, Yn41,---), 


P(C loa =] PICI Faas: 


Problem 46. Prove the preceding proposition. Hint: Use Proposition 27. 


Problem 47. Show that Proposition 31 remains true if, in condition (ii), we only 
require that C € o(Yn+1, Yn+2,-..). Hint: Use Proposition 25. 


We will have more to say about Markov sequences in Chapters 22 and 26. 


21.6. ł Conditional distributions of normal random vectors 


We conclude this chapter with an illustrative example. 


Example 4. Let X = (X1,..., X4) be an R¢-valued normal random variable, 
with mean vector 0 and covariance matrix ©, and let Y, = (X1,..., Xk) and 
Yo = (Xk+1;---, Xa), where k is an integer strictly between 0 and d. We wish to 
find the conditional distribution of Y; given Yə. For simplicity, we will assume 
that the covariance matrix of Y2 is strictly positive definite, and thus invertible. 

Let us write the covariance matrix of X as 


» » 
= ( 11 3 

D21 Do2 
Regarding Yı and Y2 as random row matrices, we have the following formula for 
the matrices &,;: 

y=). a =, 

where the expectation is taken term by term. In particular, X;; is the covariance 
matrix of Y;, i = 1,2. 


Let Z = Yi — Siete. Y>. Since X is normal, the R¢-valued random variable 
(Z,, ¥2) is also normal. Note that 


E(Z] Y2) = Dy. — E25 Ex = 0, 


428 21. CONDITIONAL PROBABILITIES 


where O stands for the k x (d—k) zero matrix. Thus, the characteristic function 
of (Z1, Y2) factors into the product of the characteristic functions of Z, and Y> 
and hence Zı and Y> are independent. 

By Proposition 12, the conditional distribution of (Z1, Y2) given Y> is the prod- 
uct of dy, and the unconditional distribution of Z1. Since Yj = Z1 + Ends Yo, 
it follows from Proposition 11 that the conditional distribution of Yı given Yo is 
normal with mean vector 42 2o Yə and covariance matrix equal to the covari- 
ance matrix of Z1: 


E(Z; Z1) 

= E(YF Y1) — E(B YN) — EV Y Xz X21) 
+ E(2Z12B55 Yo VD D1) 

= Dy. — Ley Da. 


In particular, the conditional distribution of Y; given Y2 is normal. 


Problem 48. Work out the formulas for the mean and variance of the conditional 
distribution in the preceding example for the special case in which d = 2,k = 1, 
and 


with |p| < 1. Also discuss the situation where p = +1. 


Problem 49. For the special case k = 1 of Example 4, show that the variance of 
the conditional distribution is less than or equal to the variance of Yi. 


Problem 50. (intended for readers with sufficient knowledge of linear algebra) The 
formulas for the mean and variance of the conditional distribution in the example 
are still valid when ¥22 is not invertible, provided we replace 3, by a matrix S29, 
known as the pseudoinverse of 22. To define this matrix, write X22 = Oo’ DO, 
where O is an orthogonal matrix and D is a diagonal matrix. Let D be the 
matrix obtained by replacing each nonzero element of D by its reciprocal, and let 
Soo = OT DO. Show that in the invertible case, the pseudoinverse equals the 
inverse. Then show that the modified formula is valid for the noninvertible case. 
Hint: A look at the proof shows that it suffices to show Yy2 = 212 5553555: 


CHAPTER 22 
Construction of Random Sequences 


There are situations in which it is natural to construct probability spaces by 
first specifying certain conditional distributions and then showing that there is 
a unique underlying (unconditional) distribution consistent with those specifi- 
cations. The tool for doing this is Theorem 3. Several specific examples are 
included along with three general classes of examples: exchangeable sequences, 
Markov sequences, and Polya urns. 


22.1. The basic result 


We begin with an elementary example. 


Example 1. Consider two urns labelled H and T. In urn H there are 3 
identical blue balls and 5 identical green balls. In urn T there are 7 identical 
blue balls and 2 identical green balls. A fair coin is flipped. If it shows heads, a 
ball is drawn at random from urn H. If it shows tails, a ball is drawn at random 
from urn T. 

Let Xo be the outcome of the coin toss (H or T), and let X, be the color of 
the ball drawn (B or G). We consider Xo and X, to be two random variables 
defined on an appropriate probability space. The distribution of Xo is clear from 
the description: P[Xo = H] = P[Xo = T] = 5. But the distribution of X; is not 
directly available from the description. Instead, it is the conditional distribution 
of Xı given o(Xo) that is easily obtained: 


B = 2 if Xo(w) =H 
P(X, = B | o(Xo)](w) = + if Xo(w) =T 
and 
B z = 5 if Xo(w) =H 
P[X, =G | (Xo)]() a t if Xo(w) zT 


430 22. CONSTRUCTION OF RANDOM SEQUENCES 


The (unconditional) distribution of the pair (Xo, 1) can now be obtained 
from condition (ii) in Proposition 2 of Chapter 21. We must have 


P[Xo = H,X, = B] =}. = 
P[Xo = H, Xı =G] = 
P[Xo =T, X; = B| = 
P[Xo =T, Xı =G] = 


nie nie Nje nje 
OIN OIN ojn ww 
= m m m 

aol? ols alot ale 


Thus, the unconditional distribution of Xo together with the conditional dis- 
tribution of X; given o( XQ), both of which arise naturally from the description of 
the experiment, uniquely determine the unconditional distribution of (Xo, X1). 


Here is a result that generalizes the construction described in the preceding 
example. 


Lemma 1. Let (Vo, Go) and (V1, G1) be two measurable spaces, let Ro denote 
a probability measure on (Yo, Go), and let xp ~ Rı(zo,:) be a random distribu- 
tion on (1,91) whose domain is the probability space (Yo, Go, Ro). Then there 
is a unique distribution Q on (Wo x Yı, Go x G1) such that if X = (Xo, X1) is any 
(Wo x W,)-valued random variable having distribution Q, then Ro is the distri- 
bution of Xo and R; is a conditional distribution of Xı given a(Xo). Moreover, 


Q is given by 
; A = x zo, dxr,) Rold 
(22.1) Q(A) a a T4(Zo,21) Ri (z0, dr) Ro(dzo) 


for AE Go x G1. 


PROOF. We begin by showing that a probability measure Q can be defined 
by (22.1). The interior integral in (22.1) is well-defined and nonnegative since, 
for each zo, the function I4(zo,:) is nonnegative and measurable. 

To prove that the interior integral is a measurable function of zo for all A € 
Go X Gi, consider the collection of all A’s for which this measurability property 
holds. Clearly, this collection contains sets of the form Ag x A;, where A; € G;. 
A straightforward application of the Sierpinski Class Theorem now gives the 
desired measurability for general A. 

Notice that Q is nonnegative and Q(Yo x Yı) = 1. That Q is countably 
additive follows from the linearity of integration and the Monotone Convergence 
Theorem. Hence Q is a probability measure on (Yo x Y1, Go x G1). Therefore Q 
is also the distribution of the random variable X defined by X (wo, Y1) = (Wo, Y1) 
on the probability space (Vo x Y1, Go x Gi, Q). 

That the distribution of Xo is Ro follows from setting A = Ap x Y, in (22.1) 
for an arbitrary Ao € Qo. 

We now check that (vo, Y1) ~ Ri(Xo(vo,¥1),-) is a conditional distribu- 
tion of X, given o(Xo), or equivalently, that for each B € Gi, (Yo, Y1) ~ 
Ri ((Xo(vo0, Y1), B) is a conditional probability of X;*(B) given o(Xo). Clearly, 


22.1. THE BASIC RESULT 431 


such a function is o(Xo)-measurable. To check the second condition in Proposi- 
tion 2 of Chapter 21, choose C € o(Xo). Then 


C = {(y0, Y1): Xo(o, Y1) E D} 
for some D € Go. Thus, 


J Ry ((Xo(%o, Y1), Ble (vo, v1) P(d(Yo, Y1)) 
(Yo, Y1) 
= i Ri (zo, B)Ip(zo) Ro(dzo) 


a [ Ipx B(®o0, £1) Ri (z0, dx1) Ro(dzo) 
= P(D x B) = P(CN {(vo, %1): X1 (v0, V1) E€ B}), 


as desired. 

Now let X = (Xo, X1) be any random vector having the property that Ro is 
the distribution of Xo and R; is a conditional distribution of X, given a( Xo). 
By condition (ii) of Proposition 2 of Chapter 21, its distribution Q agrees with 
Q, defined by (22.1), for A of the form Ap x A;, where A; € G;. Now the 
Uniqueness of Measure Theorem (Theorem 3 of Chapter 7) implies that Q and 
Q are the same probability measure on Go x G. O 


It is not hard to see how to extend the preceding lemma inductively to give 
a method for constructing finite random sequences (Xo, ..., Xn) of arbitrary 
length in terms of conditional distributions. For the inductive step, one is given 
the unconditional distribution of (Xo,...,X,) and a conditional distribution 
of Xk+1 given o(Xo,..-, Xx). Then one applies the lemma with the random 
variable (Xo,..., Xx) playing the role of Xo and X,%4, playing the role of X, in 
that lemma, thus obtaining the unconditional distribution of (Xo,..., X%+41). 

In order to construct infinite sequences using conditional distributions, we will 
mimic the construction of infinite product measure in Chapter 9. We need the 
following generalization of the Fubini Theorem, which follows from Lemma 1. 


Theorem 2. [Conditional Fubini] Let (Yo, Go) and (¥1,G,) be two measur- 
able spaces and let 


(Q, F) = (Yo, Go) x (Y1, G1). 


Let Ro, Ri, and Q be as in Lemma 1. If f is an R-valued measurable function 
defined on (N, F, Q) whose integral with respect to Q exists, then the function 


To ~> f (0,21) Ri (z0, dz1) 
Vi 


is an Ro-almost surely defined Go-measurable function, and 


(22.2) [ sea= F f (£0, z1) Ri (xo, dz1) Ro(dzo). 


432 22. CONSTRUCTION OF RANDOM SEQUENCES 


Problem 1. Prove the preceding lemma. 


We might write (22.2) as 


(22.3) E(f 0 (Xo, X1)) = B( fat (ova) Ra (Xo, dr1)) - 


The right side can be viewed as the iteration of two expectations, the ‘interior 
expectation’ being the ‘conditional expectation’ of f o (Xo, X1) given Xo. It is 
easy to extend Theorem 2 inductively to treat integration on products of finitely 
many measurable spaces. 

Here is the construction result for infinite sequences. 


Theorem 3. Let ((Wn,Gn),n > 0) be a sequence of measurable spaces. Let 
Ro be a probability measure on Go, and for each n > 0, let Rn+1 be a measurable 
function from (Yo, Go) x --- X (Wn, Gn) to the measurable space of probability 
measures on (Wni1,Gn41). Then there exists a probability space (Q, F, P) and 
a random sequence X = (Xo,X1,...) defined on that space such that the dis- 
tribution of Xo is Ro and, for n > 0, a conditional distribution of Xn+41 given 
a(Xo,---,Xn) is given by 


w ~ Razi((Xo(w),..-,Xn(w)), -). 
The distribution of X is uniquely determined by the relations 
P[(Xo,---,Xn) € An] 
(a =f- [ TA (£0,-.-; Eadie a Cac) Cin. <x, Ro(dzo) , 
o 


n € Zt and An E Go X °°: X Gn. 


Problem 2. Prove the preceding theorem. Hint: Look at the proof of Theorem 16 
of Chapter 9. 


Problem 3. In the setting of Theorem 3, show that for n = 0,1,..., and k = 
0,... n — 1, 


P(A kps An) EA | a(Xo,. . ., Xk)] 
z -f TAM Guach) Re (Korsa Xk dkr tnei een) 
k+ n 
- Re+1((Xo,.--, Xk), d£k41) , 


A € Gk+1 X*+: X Gn. Hint: In checking the second of the two conditions that char- 
acterize conditional probabilities, use the Conditional Fubini Theorem repeatedly 
to calculate the relevant expected value. 


22.2. CONSTRUCTION OF EXCHANGEABLE SEQUENCES 433 


We often take the point of view that a random sequence X represents the 
sequence of random values taken by some system that evolves in time, with Xn 
being the value at time n. The preceding result then says that the distribution 
of X can be specified by giving the ‘initial distribution’ (the distribution of Xo) 
and, for each time n, giving the conditional distribution of the ‘present’ value 
Xn in terms of the ‘past’ values Xo,..., Xn-1- 


Problem 4. [Destructive Random Walk] Show that there exists a sequence of Z°- 
valued random variables which fits the following informal description. Let X, be 
the position at time n > 0 of a particle that starts at the origin at time 0. The 
particle moves somewhat like a simple symmetric random walk, except that it is 
not allowed to return to a previously visited point. One may think of the points of 
Z? as locations which are destroyed when visited by the particle. If the particle is 
at location x, it moves with equal probability to any undestroyed location that is 
distance 1 from zx. If there are no such locations, the particle stops moving forever. 
(Note that the sequence described is actually not a random walk.) 


Problem 5. [Random Walk with Reinforcement] Let G be the complete graph on 
three vertices. In other words, G is the graph that looks like a triangle; it contains 
three vertices and three edges, with one edge connecting each pair of vertices. 
Show that there exists a sequence (Xo, X1,...) whose terms take values in the 
set of vertices of G and which fits the following informal description. Each edge 
of G is initially labeled by some positive integer and Xo equals each of the three 
vertices with probability 1/3. At each stage, the particle moves from its current 
position to one of the other two vertices along the corresponding connecting edge. 
The probability that it chooses to travel along a given edge is proportional to the 
integer that labels that edge. Each time the particle travels along an edge, the 
integer that labels that edge is increased by 1. (Note that the sequence described 
is actually not a random walk.) 


22.2. Construction of exchangeable sequences 


In this section, we construct a certain rather special but important class of 
random sequences. It turns out that in this case, we only need Lemma 1 rather 
than the more powerful Theorem 3. 


Example 2. [Random iid sequences] Let Y denote a Borel space with o-field 
G, and let (Q, 9) denote the measurable space of probability measures on (Y, G). 
Let Ro be a probability measure on (Q,9). By Lemma 1, there exists a pair 
(Q, X) of random variables defined on some probability space (Q, F, P) such that 
the distribution of the random variable Q is R and the conditional distribution 
of X given o(Q) is 


©), 
n=l 


434 22. CONSTRUCTION OF RANDOM SEQUENCES 


the infinite product of Q with itself. Note that Q is a random distribution on 
(W,G) and X is a (W,G)~-valued random variable, or in other words, X = 
(X,, Xe,...) is a random sequence of (W,G)-valued random variables. 

It is easy to check that X is a conditionally iid sequence given o(Q). One 
thinks of a two-stage experiment in which one first chooses a distribution Q on 
(Y,G) at random, and then forms an iid sequence X with common distribution 
Q. For this reason, X is sometimes called a ‘random iid sequence’, even though 
this term may be somewhat misleading. The sequence X is not independent, 
but is instead only conditionally independent (given o(Q)). 


Problem 6. Describe the distribution Ro of Example 2 for the experiment of flip- 
ping a coin an infinite sequence of times, in a setting where the probability of heads 
is not known, but rather uniformly distributed on [0, 1]. 


Problem 7. Describe the distribution Ro of Example 2 for the experiment of ob- 
serving an infinite sequence of normally distributed random variables, the mean 
and standard deviation of which are not known, but rather distributed like a pair 
(Y,|Z|), where (Y, Z) is an independent pair of standard normal random variables. 


Definition 4. Let Z be a countable set. A sequence (X;: i € T) (finite or 
infinite) of random variables on a probability space (Q, F, P) is exchangeable if, 
for every permutation p of Z, the distributions of (Xp): i € Z) and (X;: 7 € T) 
are identical. 


The reader can check that to verify exchangeability one only need verify the 
condition in Definition 4 for permutations p having the property that p(z) = i for 
all but finitely many i. Notice that a finite or infinite iid sequence is exchangeable. 
The next problem generalizes this fact to conditionally iid sequences. 


Problem 8. Show that any sequence that is conditionally iid given some o-field is 
exchangeable. 


Problem 9. An urn contains r red balls, w white balls, and y yellow balls. The 
experiment consists of randomly drawing balls one at a time from the urn without 
replacement until all the balls have been drawn, yielding a random sequence X = 
(X1, X2,..., Xn), where n = r+w +y. Prove that the sequence X is exchangeable. 


For the Borel space setting it will be shown in Chapter 27 that every exchange- 
able infinite sequence is conditionally iid with respect to some o-field, and that 
in fact, every such sequence can be constructed in the manner illustrated by 
Example 2. 

Here is a simple example of a finite exchangeable sequence that is not con- 
ditionally iid with respect to any o-field. The example consists of two random 
variables X} and Xə = —X,, where X; equals +1 according to a fair coin flip. 


22.2. CONSTRUCTION OF EXCHANGEABLE SEQUENCES 435 


* Problem 10. Show that the sequence (X1, X2) defined in the preceding paragraph 
is not conditionally iid and that it also is not the beginning of a three-term ex- 
changeable sequence. 


Definition 5. Let Z be a countable set. A sequence (X;: i € T) (finite 
or infinite) of random variables having values in a Borel space is conditionally 
exchangeable given a o-field G if, for every permutation p of Z, the conditional 
distributions of (X;: i € Z) given G and (Xp): i € Z) given G are the same. 


The next proposition generalizes Problem 8. It says that conditional ex- 
changeability implies exchangeability. 


Proposition 6. A finite or infinite sequence of random variables that is con- 
ditionally exchangeable given some o-field is (unconditionally) exchangeable. 


* Problem 11. Prove the preceding proposition. 
Problem 12. In connection with Definition 5, comment on the modification of 
Problem 9 in which the numbers of balls of each color are random and not neces- 
sarily independent. 
Problem 13. Let (X1, X2,..., Xn) be an exchangeable sequence of R¢-valued ran- 
dom variables. Let S = ae X;. Prove that the sequence (X1, X2,...,Xn) is 
conditionally exchangeable given a(S). 

* 


Problem 14. Consider a simple random walk S = (So = 0, S1, S2,...) in Z whose 
step distribution assigns probability p to {1} and probability (1—p) to {—1}, where 
0 <p<1. Use Problem 13 to find the conditional distribution given o(S,,) of the 
vector consisting of the first n steps of S. 


Problem 15. The formulas for the conditional probabilities in the solution of the 
preceding problem do not depend on p. Discuss this fact. Also, discuss the mod- 
ifications, if any, needed in that problem to accommodate the cases p = 1 and 
p=O0. 


* Problem 16. Let U be distributed according to the standard beta distribution 
with parameters a and 8. Find the (unconditional) distribution of the first term 
of a sequence that is conditionally iid given o(U), where the common conditional 
distribution is Bernoulli with parameter U. Also find the joint distribution of the 
first two terms of the infinite sequence. 


Problem 17. Repeat the preceding problem for a sequence of conditionally iid nor- 
mal random variables with variance the constant o° and random mean ©, where 
© has the standard normal distribution. 


436 22. CONSTRUCTION OF RANDOM SEQUENCES 


22.3. Construction of Markov sequences 


Theorem 3 indicates a general method of constructing Markov sequences. Sup- 
pose that in that theorem the functions R,41,n > 1, have the property that, for 
each fixed zn and B € Gna, 


(£o, ns ene) eR Rn+ı (zo, tee natn) B) 
is a constant function. Then it follows from Problem 3 that 
PIX 424, zys SAHER) € Bn+1 Xo X Brik | o(Xo,. bs Na) 


is a measurable function of X, for Bni; E Gnij, J = 1,...,k. A straightfor- 
ward Sierpiński class argument now gives that for any B E€ Gn+1 X Gn4o X..., 
P|(Xn+1, Xn42;---) E€ B | o(Xo,..., Xn)] is also a measurable function of Xp, 
and thus equal to P[(Xn41,Xn42,---) E€ B | o(Xn)]. By Problem 47 of Chap- 
ter 21, X is a Markov sequence. 

Under the circumstances just described, it is natural to write 


Rnti(2n,-) in lieu of Rnyii((o,21,---,2n),°)- 


The distribution Ro is called the initial distribution of the Markov sequence 
and, for n > 1, Rn is called the n transition function. For fixed z, Ry(z,-) is 
a transition distribution. 

The preceding paragraph characterizes all Markov sequences in Borel spaces 
(see Problem 18). Throughout this book all Markov sequences will have values 
in Borel spaces. When all of the transition functions R, are identical, the cor- 
responding Markov sequence is said to be time-homogeneous, and the function 
Rı = Ro =... is the transition function. 


Problem 18. Show that if X is a random sequence of random variables that take 
values in a Borel space (V,G), then X is Markov if and only if for each integer n > 0 
there exists a measurable function Rn+1: (¥,G) > (Q,H), where (Q,%) is the 
measurable space of probability measures on (V,G), such that w ~> Rnr+i(Xn(w),:) 
is the conditional distribution of Xn+1 given o(Xo,..., Xn). Further show that this 
condition is equivalent to the following: 


(22.5) P[Xn41 EB | a(Xo,... ,Xn)] = P[Xn41 E€ B | o(Xn)] a.s. 


for n > 0 and B € G. Hint: Theorem 22 of Chapter 21 will be needed for the first 
part. 


Problem 19. Add one green ball to the contents of the urn of Problem 9, and 
introduce the condition that whenever the green ball is drawn it is returned to the 
urn. Let Z, denote the contents of the urn after n balls have been drawn (counting 
draws of the green ball), with Zo denoting the initial contents of the urn. Find the 
transition function of the time-homogeneous Markov sequence thus defined. 


22.4. POLYA URNS 437 


A random walk (So, 5;,...) in either R or R isa time-homogeneous Markov 
sequence. Its transition function R; is related to the step distribution Q of the 
random walk in a simple manner (see Problem 23 of Chapter 21): 


(22.6) Rı(z, B) = Q({y: £z +y E€ B}. 


If we are taking the point of view that Sọ = 0, then its initial distribution Ro 
equals the delta distribution at {0}. If Ro is arbitrary, the term ‘random walk’ 
may still be used. 


22.4. Polya urns 


We now introduce a model that will serve many purposes. In particular, it will 
illustrate some aspects of preceding sections. 


Example 3. (Polya urns] An urn contains a finite number of balls, each of 
which is either blue or orange; there is at least one ball of each of the two colors. 
The contents of the urn are changed according to the following procedure. A 
ball is drawn at random from the urn, its color is observed, and then that ball is 
placed back into the urn along with c more balls of the same color, where c is a 
fixed positive integer. This procedure is repeated infinitely often. (It is assumed 
that there is an unlimited supply of balls of both colors outside the urn, and 
that the urn is large enough to contain an arbitrary finite number of balls.) 

We represent the states of the urn by members of (Z+ \ {0}), the first coor- 
dinate indicating the number of blue balls and the second the number of orange 
balls. The preceding section makes it clear that there is a time-homogeneous 
Markov sequence ((Xn, Yn): n = 0,1,2,...) for which the ‘Markov-type’ descrip- 
tion just given is appropriate. (Alternatively, one could construct a probability 
space especially suited to the example at hand; its members could be all infinite 
sequences of B’s and O’s.) The initial (possibly random) state of the urn is 
represented by the distribution Ro of (Xo, Yo) and the transition function Ry is 
given by 

T 


Rı((z,y), -) = —— O(z+0,y) gg 


r+y O(n, yte) 7 


Ee 
tT+Yy 
where 6, denotes the delta distribution at a. 

For n = 1,2,..., let 


I (w) = 1 if Xn(w) > Xn-1(w) 
s QO otherwise. 


Thus, J,(w) equals 1 or 0 according to whether a blue ball or orange ball is 
drawn at time n. For each finite sequence (€1,€2,...,€,) of 0’s and 1’s, we will 
calculate 


(22.7) P[(D,..., Ik) = (€1,.-.,€k)] 


438 22. CONSTRUCTION OF RANDOM SEQUENCES 


and obtain a formula that depends only on the numbers of 0’s and 1’s in the 
sequence (€,,...,€,) and not on their order. 
We begin with the following equalities among events: 


= ( )[(%,¥j) = (Xy-1, Yy-1) + e(€j,1 - €3)] 


k J 
(22.8) = [Xi Y) = (Xo, Yo) +e) (en 1- €:)] 
J=i i=1 
By iterating the Conditional Fubini Theorem, we obtain the probability of the 
event in (22.8): 


k j-1 j 
J JI R; (x0, 40) + cX (Ei, 1—e;); {(%0, yo) + c$ (ei,1 z e1)} ) Ro(d(xo, yo)) 
j=l i=l 


=1 


To + Yo + c(j 7 1) Ro(d(z0, Yo)) 


_ / I e;(to + eD] ei) + (1 - e) (vo + eDi 0 — €:)) 
fae r D k—r _ 
= > 3 [T,=1 [#0 + @ — 1)e] [m= [yo + (m — 1)el Pa aay 


k : 


where r denotes the number of 1’s in the sequence (€),...,€,) (and, of course, k— 
r equals the number of 0’s). It follows that (In: n = 1,2,...) is an exchangeable 
sequence. 


Problem 20. For the preceding example, calculate 
Pw": Tow!) = 1} | a(o). 
Hint: Use exchangeability to avoid extensive calculation. 


* Problem 21. In Example 3 suppose that there are zo blue balls and yo orange balls 
in the urn at time 0. (That is, suppose that Ro is the delta distribution at (xo, yo).) 
Calculate the correlation of Im and J, as a function of ro, yo, c, m and n. Find 
the limits (if they exist) of the correlation as ro — oo (as a function of yo, €, m, 
and n) and as (£o, yo) — (co, 00) (as a function of c, m and n) and as c - oo (as 
a function of £o, yo, m, and n). Give intuitive explanations of the results of your 
limit calculations. 


* Problem 22. For Example 3 show that the events 


limsup[Jn =1] and limsup[J, = 0] 


n—> CO n- OO 


both have probability 1. 


22.5. COUPON COLLECTING 439 


Problem 23. Continuing with the notation of Example 3, let 
J = inf{j: Xj > Xo, Y; > Yo} 


For c = 1 and Ro the delta distribution at (1,1), decide whether E(J) < œ or 
E(J) = œ. Does your answer remain the same for other c and Ro? 


22.5. + Coupon collecting 


Suppose a person is trying to collect a certain class of objects (such as stamps, 
baseball cards, movie posters). We call these objects ‘coupons’, and denote 
by m the number of different coupons contained in a ‘complete’ collection. The 
collector receives coupons of random type one by one, continuing until a complete 
collection is achieved. 

We model the situation just described as follows. Let (Y1, Yo,...) be an iid 
sequence of random variables, each of which is uniformly distributed on the 
finite set {1,...,m}. These represent the coupon types received by the collector 
during the course of trying to build a complete collection. When n coupons (not 
necessarily all different from each other) have been received, the state of the 
collection is 


Xw ANIO UUA 


n = 0,1,2,..., where it is understood that Xp = @. It is easily checked that 
the random sequence X = (Xn: n = 0,1,2,...) is a Markov sequence that takes 
values in the space of all subsets of {1,...,m}, whose initial distribution is ôg 
and whose transition function Ri = Ro =... is given by 
#xr 1 
Deng = = "6, + 2 — Sena} - 
atn 


(In fact, the sequence X is a random walk on the semigroup of subsets of 
{1,...,m} with the semigroup operation being set union. See Chapter 11.) 

For 0 < j < m, set 

Nj(w) = inf{n: #(Xn(w)) = j}. 
Our goal is to study Nm, the time at which the full collection is completed. Of 
course No = 0 and hence 
(22.9) Nm = >_[Nj — Nj-1).- 
j=l 

Our first step will be to show that the summands in (22.9) are independent 

and that [N; — Nj-1] is geometrically distributed with support Z \ {0}, mean 


m 


PAS T 


440 22. CONSTRUCTION OF RANDOM SEQUENCES 


variance 

m(j +1) 

Var(N; — N;_1) = —— =, 
( J Jj 1) (m — j 4 1)2 

and probability generating function 


(m-—j+1)s 
Poe ssi ibe 


m—(j-1)s— 
For x a subset of {1,...,m}, we let z} denote the collection of all subsets of 
{1,...,m} that contain every member of z and exactly one more element. If 


#x = j — 1, then for integers p > (j — 1) and n > 0, 


PING = DGS) SNA No) oc ON) Ga = N= 
= nl TEN acai alg ll ses 
SRU Ri(a,2") = (2) (2). 
Now multiply by P[X;~1 = z, Nj-1ı = p] and sum over p and z to obtain 
P[(N; — Nj-1) =n | (Ni — No)... , (Nj-1 — Nj-2)] 
= (— = 


Since this conditional probability is a constant, it follows from Problem 21 of 
Chapter 21 that (N;—Nj_1) is independent of o ((Nı— No), .-. , (Nj-1—Nj-2))- 
The independence of the m random variables (N; — N;-1), 1 < j < m, follows 
from Proposition 5 of Chapter 9, and the advertised geometric distribution be- 
comes apparent from (22.10). 

From (22.9) we obtain 


(22.11) E(Nm) pe S 


and 


— 3 2 
SZ (m-j+1) 
“m—i 
=m9, 32 
i=l 
m m 
1 1 1 
ee ee ete 
(S-45) 


The random variable N,,/m represents the average amount of time that one 
waits for a coupon different from those that one already has. Its mean can be 
obtained by dividing both sides of (22.11) by m: 


(22.12) e(%) = 3 A 


22.5. COUPON COLLECTING 441 


which approaches oo as m > œ. By dividing Var(Nm) by m? we obtain 
Na “1 ii 
(22.13) Var (“| = 2 er? ae 


which, unlike the mean, approaches the finite limit 57°", i7? = 17/6 as m > oo. 

We turn to the issue of explicitly calculating the distribution of Nm. The case 
m = 1 is trivial; we take m > 1 in what follows. The probability generating 
function p of Nm is the product of the probability generating functions of the 
appropriate random variables of geometric type: 


p(s) = I (m= k)s 


m — ks 
k=0 


—2 


Using partial fractions we obtain 
m—l1 m 
8 pm- II 1 m — k l 
z k-l/m—ks 
k=0 [=0 
[Zk 


We recognize that 7 E is the probability generating function of a geometric 
distribution whose density with respect to counting measure on ZT is 
k \,k 


me Ga ae De ON E E 


(22.14) p(s) = - 


Thus, (mb) is the probability generating function of the distribution whose 


density with respect to counting measure is 


k k —m : 
ow $= EAM ifn dm 
ifn<m. 


Taking account of the coefficients and summation in (22.14) we obtain 


e (oH L(A) 


li 
33 
3 
L 
~~ 
a 
i 
eE 
z~ 
ic 
I 


k=0 1=0 
Ixk 
E fn 1 
22.15 = mn =i m—1-k n=l- 
(22.15) ae. a 


* Problem 24. Simplify the formula for P[Nm = n] obtained above by writing k”~! 
in terms of falling factorials and Stirling numbers. 


442 


22. CONSTRUCTION OF RANDOM SEQUENCES 


Problem 25. Let S(n,k) denote a Stirling number of the second kind. Use the 
answer to the preceding problem to prove that k!S(n,k) equals the number of 
ways of placing n distinguishable objects into k distinguishable urns so that no 
urn is empty. Also, show that S(n,k) equals the number of ways of placing n 
distinguishable objects into k indistinguishable urns so that no urn is empty. 


Problem 26. As an alternative approach to obtaining the formula (22.15), use 
inclusion-exclusion to calculate P[Nm > n]. 


Problem 27. We have kept m fixed for the coupon collecting problem, except when 
describing the limiting behavior of the mean and variance. If one wants to treat 
m as a variable throughout, then double subscripts are appropriate. We let Nj,m 
denote the first time j types are obtained when the total number of types is m, 
and ask ourselves about convergence in distribution of the sequence 


(See Een), m=] 2 a. 
\/ Var(Nim,m) l o? 


Is Chapter 16 useful for solving this problem? If so, use it. If not, approach the 
problem in some other fashion. 


CHAPTER 23 
Conditional Expectations 


Integration with respect to conditional distributions gives conditional expecta- 
tions. A precise definition is given in the first section of this chapter, after which 
several equivalent formulations are given. An interesting sidelight is the proof, 
at the end of the first section, of the Radon-Nikodym Theorem. The remain- 
ing sections are devoted to various formulas and properties, some of which are 
analogous to properties obtained in Chapters 4, 5, and 8 for (unconditional) ex- 
pectations. Conditional variances are also treated and a useful formula relating 
conditional and unconditional variances is proved. 


23.1. Definition of conditional expectation 


The relationship between (unconditional) expectations and (unconditional) dis- 
tributions is our guide for the following definition. 


Definition 1. Let X be an R-valued random variable defined on a proba- 
bility space (Q, F, P), Q the distribution of X, and G a sub-o-field of F. The 
conditional expectation of X given G, denoted by 


E(X | G), 


is the function 
g [saa G)(w), 
R 


where it is to be understood that, evaluated at any particular value of w, the 
conditional expectation can be any member of R or be undefined. 


By Proposition 16 of Chapter 21, conditional distributions of R-valued random 
variables are determined up to a null event. Therefore conditional expectations 
are also determined up to a null event. We have used this fact implicitly in 
the preceding definition when using the word ‘the’ before the term ‘conditional 
expectation’. 

Unless otherwise stated, if we make an assertion concerning an equality that 
involves one or more conditional expectations, we are in particular asserting that 


444 23. CONDITIONAL EXPECTATIONS 


for almost all w, either the quantities on both sides of the equality are defined 
and equal, or neither side is defined. Usually, the existence of a null event where 
the equality fails is to be understood, even when the modifier ‘a.s.’ has been 
omitted. 

In case G = o(Y) for some random variable Y, we often write 


E(X |Y) for E(X | a(Y)). 


The reader can observe that the following is a consequence of Lemma 7 of Chap- 
ter 21. 


Proposition 2. Let X be a R-valued random variable and Y a (Y, H)-valued 
random variable, with X and Y both defined on a common probability space. 
Then there exists a measurable function q: Y — [R U {undefined}] such that 
E(X |Y)=q0Y as. 


Definition 1 has the advantage of being quite general and intuitive. It makes 
sense for any R-valued random variable X and any conditioning o-field G, and is 
particularly useful when one has a formula for the conditional distribution. But 
otherwise it can be difficult to use, and it is quite helpful to have some other 
equivalent versions of the definition. The remainder of this section is mainly 
concerned with three results, each of which provides a different characterization 
of conditional expectation and could therefore be taken as an alternative to 
Definition 1, although in each case, an additional assumption is required for 
equivalence. The following proposition and its consequences will be useful in 
obtaining the first characterization. 


Proposition 3. Let X be a (V,H)-valued random variable on a probability 
space (NQ, F, P) and suppose that a conditional distribution Z of X given G exists, 
where G is a sub-o-field of F. Let p denote a measurable R-valued function on 
(Y, H). Then 


E(yoX |G)= [ o@) Zax) a.s. 


PROOF. Apply Proposition 11 of Chapter 21. O 


Problem 1. Show that if X is R-valued, then 
E(X |G) = E(X* |G) - E(X" |G), 


with the usual understanding that the equality holds almost surely on the set where 
either side is defined. Hint: Use the preceding proposition with y(r) = zt and 


p(z) = z7. 
Problem 2. [Positivity of conditional expectation] Show that conditional expecta- 


tion has the same kind of positivity as unconditional expectation, in the sense that 
if X is R'-valued, then E(X |G) is almost surely R -valued. 


23.1. DEFINITION OF CONDITIONAL EXPECTATION 445 


Problem 3. [Linearity of conditional expectation] Show that conditional expecta- 
tion is linear, in the sense that (i) E(aX | G) = aE(X | G) for real a and (ii) 
E(X +Y |G) = E(X | G)+E(Y | G). In (i), it is to be understood that 
0 - (undefined) = 0 and a - (undefined) = undefined if a # 0. In (ii), the equal- 
ity is required to hold almost surely on the set where each of the two conditional 
expectations on the right side and their sum are defined, and is not required to 
hold elsewhere, even if the left side is defined. Hint: Let Z = (X,Y) and apply 
Proposition 3 with y((z,y)) =z + y, with y((z,y)) = z, and with y((z,y)) = y. 


Problem 4. Show that E(Ia | G) = P(A |G). Use this fact in conjunction with 
Proposition 3 to show that if X > 0, then there exist nonnegative simple random 
variables X,,n > 1, such that Xn 7 X as. and E(X, | G) Z E(X | G) as. as 
n —> oo. Conclude from this result and the previous problem that if X is R-valued, 
then E(X | G) is G-measurable, in the sense that each of the following sets is in G: 


{w: -œ < E(X |G) <r}, TER, 
{w: E(X | G)(w) = ov}, 

{w: E(X | )(w) = too}, 

{w: E(X | G)(w) is undefined} . 


Also, give a direct proof of this last fact using the Conditional Fubini Theorem. 


Problem 5. Let (X1,..., Xn) be a finite exchangeable sequence of R-valued ran- 
dom variables, and let S = X; +---+ Xn. Prove that for k =1,2,...,n, 
S 
E(Xk | S) = — 
(Xi |S) = — 


almost surely on the set where E'(X; | S) exists. Hint: Use Problem 13 of Chap- 
ter 22. 


Problem 6. Let Xi,...,X, be iid R-valued random variables with common density 
f with respect to Lebesgue measure. Assume that f(zx)|z| is bounded for x € R. 
Show that E(X; | S) = S/n a.s. Hint: Use the formula in Problem 35 of Chapter 21. 


Problem 7. Let Y be an Rt-valued random variable with infinite mean. Let 
(Xı, X2) be a random pair that equals (Y,—Y) or (—Y,Y) each with probabil- 
ity 1/2. Discuss the relevance of this example to Problem 5. 


Our first characterization of conditional expectation is analogous to the char- 
acterization of conditional probabilities in Proposition 2 of Chapter 21. In order 
to properly understand its statement, we need to make two conventions concern- 
ing the value ‘undefined’. The first concerns the expectation of a random variable 
Y that takes values in the set R U {undefined}. If P[Y = undefined] > 0, then 
E(Y) is undefined. Otherwise, E(Y) = E(Y), where Y(w) = Y (w) if Y (w) € R 
and Y(w) = 0 if Y(w) = undefined. The second convention was introduced 
in Problem 3. Both conventions play a role when we consider something like 
E(Y ; B) = E(Y Ip). 


446 23. CONDITIONAL EXPECTATIONS 


Theorem 4. Let X be an R-valued random variable and let Y be a random 
variable which takes values in the space RU {undefined}. If Y = E(X |G) a.s. 
then 


(i) Y is measurable with respect to G; 
(ii) E(Y ; B) = E(X ; B) for all BEG such that E(X ; B) exists. 


On the other hand, if E(X) exists and Y satisfies (i) and (ii), then Y is defined 
and equal to E(X | G) almost surely. 


PROOF. For the proof that E(X | G) is G-measurable, refer to Problem 4. To 
complete the proof of the first half of the theorem, it remains to show that (ii) 
holds with Y = E(X |G). First consider the case in which X is R` -valued. By 
Problem 2, E(X | G) is also R` -valued, so both expectations in (ii) exist for all 
B € G. To calculate E(E(X | G)Ig), we use the Conditional Fubini Theorem, 
with Xo = Ip,X, = X, and f(zo, z1) = ox, (see (22.3) of Chapter 22 for 
the appropriate version of the formula in that theorem). Let Q denote the 
distribution of X. We obtain 


E(E(X | 9)In) = E(In | 2Q(de | 9)) 
Š E(f Ipzx Q(dz | g)) = E(Ip X) 


as desired. In this calculation, the first equality follows from the definition of 
conditional expectation, the second from the fact that Jp does not depend on z, 
and the third is the one that uses the Conditional Fubini Theorem. 

Now consider R-valued X. If E(X ; B) exists, then at least one of the two 
quantities E(X*; B) and E(X- ; B) is finite. From the preceding paragraph 
we obtain E(X*; B) = E(E(X7 | G); B) and similarly for X~. The desired 
result now follows from Problem 1 and the linearity of expectation. 

For the proof of the second half, we assume that E(X) exists, so that E(X ; B) 
exists for all B € G. Thus, if Y satisfies (i) and (ii), E(Y; B) exists and is 
determined for all B € G, so Y is almost surely defined. By the first part of the 
theorem, to complete the proof it is enough to show that if Y and Z are almost 
surely defined G-measurable random variables such that E(Y ; B) and F(Z; B) 
exist and are equal for all B € G, then Y = Z as. Let W = Y — Z. Then W 
is G-measurable, and E(W ; B) = 0 for all B € G. By letting B = [W > 0] and 
B = [W < 0] in this equation, we see that W = 0 a.s., as desired. O 


Problem 8. Show that if E(X) exists, then E(X | G) is almost surely defined, and 
E(E(X | G)) = E(X). Conclude that if E(X) is finite, then E(X | G) is almost 
surely finite. 


Problem 9. Let X have a standard Cauchy distribution and set G = o(|X|). Show 
that E(X |G) =0 as. and that there exists a B € G for which E(X ; B) does not 


23.1. DEFINITION OF CONDITIONAL EXPECTATION 447 


exist, but E(E(X | G); B) does exist. (This example shows that the assumption 
that E(X) exists in the second half of Theorem 4 cannot be eliminated entirely.) 


The next set of problems contains our second and third characterizations of 
conditional expectation. The second characterization requires that X have a 
finite second moment. The third requires that X be finite almost surely. 


Problem 10. Let X € L2(Q,F,P) and let G be a sub-o-field of F. Show that 
E(X | G) is the orthogonal projection of X onto L2(Q,G, P). Hint: Use Theorem 4. 


* Problem 11. Suppose that X and V are R-valued with finite second moments and 
that E(X | V) =qoV for some decreasing q. Prove that Cov(X,V) < 0. 


Problem 12. Use the Radon-Nikodym Theorem to construct E(X | G) if X is 
almost surely finite. Hint: First assume that X is nonnegative and bounded. Con- 
sider the measurable space (Q, G) and let u equal the restriction of P to G. Define 
v(A) = E(XTIa) for all A € G. Show that v < u and then check that = satisfies 
the conditions of Theorem 4. 


The preceding problem shows that the Radon-Nikodym Theorem, which was 
introduced without proof in Chapter 8, can be used to construct conditional 
expectations. We will now reverse the viewpoint of this problem and use condi- 
tional expectations to prove the Radon-Nikodym Theorem, under the additional 
assumption that both measures are finite. The reader can do the extension to 
the o-finite case. 


PROOF OF RADON-NIKODYM THEOREM IN FINITE CASE. Let p and v be fi- 
nite measures on a measurable space (Q, F) having the property that v < m. 
The existence of T is obvious if v(Q) = 0, so hereafter we assume v((Q) > 0 and 
therefore (Q) > 0. 

Let Y = Q x {1,2} and denote by H the collection of subsets of Y having the 
form (A x {1}) U (B x {2}), where A,B € F. We define P by 


P((A x {1} U (B x {2})) = z H(A) + z (B). 


Clearly, (¥, H, P) is a probability space. On this probability space we introduce 
the random variable X equal to the indicator function of Q x {2}, and we condi- 
tion with respect to the o-field G consisting of sets of the form A x {1,2}, where 
A € F. For such a set, 


(23.1) E(X ; [A x {1,2}]) = E(E(X | 9); [A x {1, 2}]), 


by Theorem 4. The left side equals ao) v(A). We note that there exists a 
measurable function g defined on Q such that E(X | G)(w,k) = g(w) for every 
w and both k. Hence the right side of (23.1) equals 


| 94st i+ ai Y)- 


448 23. CONDITIONAL EXPECTATIONS 


From (23.1) and the discussion following it we obtain 


(23.2) (A) f (a gydv= | HR ody. 
A ar 
Letting A = {w: g(w) = 1} in (23.2) we see that u({w: g(w) = 1}) = 0. Since 
v & u we also obtain v({w: g(w) = 1}) = 0. The Reciprocal Rule (Problem 33 


of Chapter 8) thus applies to give v < ņ and a = E (when g £ 1 and defined 


to equal any constant when g = 1). We also have y = COLE by (23.2). By the 


Chain Rule (Proposition 20 of Chapter 8), exists (equaling mCi i almost 
everywhere with respect to u). O 


Problem 13. Use the validity of the Radon-Nikodym Theorem for the case where 
both measures are finite to prove it in case both measures are o-finite. 


23.2. Conditional versions of unconditional theorems 


As has already been done with positivity and linearity, certain other properties 
and results concerning expectation will be now be adapted to the conditional 
setting. The choices made here are not necessarily based on importance, but 
rather on various phenomena they highlight. The first example illustrates how 
a ‘big’ space like R” can be of use even in a rather simple situation. 


Example 1. [Conditional Monotone Convergence Theorem] Let 0 < Xı < 
Xə < ... be an increasing sequence of R-valued random variables defined on 
a probability space (2,F,P). Then X = (X1,X2,...) is a random sequence 
that takes values in the space A of increasing sequences of members of R. It is 
easily shown that A is a Borel subset of R”, so A is a Borel space by Propo- 
sition 20 of Chapter 21. Let Q be the distribution of X. It follows from the 
existence theorem for conditional distributions that, for any sub-o-field G of F, 
the random sequence X has a conditional distribution Q(- | G) given G. For 
T = (21,%2,...) € A, let p(x) = £n, n = 1,2,..., and y(x) = limp-4.0 Tn. Since 
the functions Yn and y are measurable, we have by Proposition 3 that 


B(Xn |9) = f onl) Qldz | 9) EE teen 


and 


E( lim Xn 1G) = | ole) Q(dz | 9) as 


Since (Yn: n = 1,2,...) is an increasing sequence of R` -valued functions on A, 
it follows from the (unconditional) Monotone Convergence Theorem that 


(23.3) lim E(Xn | 9) = E( lim Xn | 9), 


giving us a Conditional Monotone Convergence Theorem. 


23.2. CONDITIONAL VERSIONS OF UNCONDITIONAL THEOREMS 449 


Problem 14. Provide an alternative proof of the Conditional Monotone Conver- 
gence Theorem based on Theorem 4. 


Problem 15. [Conditional Dominated Convergence Theorem] State and prove a 
Conditional Dominated Convergence Theorem. Then notice that, as a consequence 
of Problem 8, the same conclusion holds if the dominating random variable Y has 
finite (unconditional) expectation. 


Problem 16. Find an example for which the Conditional Dominated Convergence 
Theorem can be applied for w in a set of positive probability, but not for almost 
every w. 


* Problem 17. Find an example for which the Conditional Dominated Convergence 
Theorem can be applied for almost every w, but for which the (unconditional) 
Dominated Convergence Theorem is not applicable. 


Problem 18. [Conditional Uniform Integrability Criterion] State and prove a Con- 
ditional Uniform Integrability Criterion. 


The following example shows that unconditional uniform integrability does 
not imply conditional uniform integrability, so the conclusion of the Conditional 
Uniform Integrability Criterion does not necessarily hold for uniformly integrable 
sequences. 


(Counter)example 2. Let P denote Lebesgue measure on the unit square 
Q = (0,1)? and let F denote the Borel field of Q. Let G denote the o-field 
generated by events of the form 


{(w1,w2) E Q: wı E B} 


for Borel subsets B of (0,1). Let (B1, Bo,...) denote a sequence of Borel subsets 
of (0,1) such that the one-dimensional Lebesgue measure of B, equals 1/n and 
limsup B, = (0,1). (We leave it as an exercise to show that such a sequence 
exists.) Let 


Xn (41, w2) = Ip, (w1)Zn(wa) , 


where Zn is a nonnegative random variable yet to be specified. We want to choose 
Zn so that the sequence (X1, X2,...) converges almost surely and is uniformly 
integrable, but with probability 1 fails to be conditionally uniformly integrable. 
We leave it to the reader to arrange for: 


(i) Zn(w2) > 0 for almost every w» ; 
(ii) Seo.) Zn(We) dwz > 00; 

Zn(We) dw 
(iii) sis SOE 5g 


0,1) fn 
n 

From condition (i) it follows that X, — 0 a.s. and so we are in a situation 
where the Uniform Integrability Criterion is relevant. The quotient in condition 
(iii) is just the expectation of the nonnegative random variable X„ and so, by 


450 23. CONDITIONAL EXPECTATIONS 


condition (iii), all of the equivalent conditions in the Uniform Integrability Cri- 
terion hold. It is easy to calculate the conditional distribution of each X, given 
G and thus the corresponding conditional expectation. The result is 


(23.4) E(X | G)(wi,02) = Ip, (v1) | Zala) da, 
(0,1) 

which, for each (w,,w2), approaches co on the sequence of those n for which 

wı € Bn. Thus, according to the Conditional Uniform Integrability Criterion, 

the sequence (X1, X2,...) fails with probability 1 to be conditionally uniformly 

integrable. 


Problem 19. Show that the Borel sets (Bi, Bz,...) can be chosen as described in 
the preceding example. Also show that conditions (i), (ii), and (iii) can be satisfied. 
Finally, make the calculation required to establish (23.4). 


Example 3. [Conditional Jensen Inequality] Let X be an R-valued random 
variable defined on a probability space (Q, F, P), Q the distribution of X, and J 
an interval that supports Q. Let y be an R-valued convex function defined on J 
and let G be a sub-o-field of F. Since J is a Borel space, the conditional distri- 
bution of X given G exists. Therefore, by the (unconditional) Jensen Inequality, 


[eao > of f zQ 9)(w) 
= (yo E(X | G))(w) 


for almost all w such that E(X | G)(w) exists and is finite. Applying Proposi- 
tion 3 to the left side of this last expression, we obtain, for such w, 


(23.6) E(poX | G)(w) > (vo E(X |G))(w), 


which is the desired conditional form of the Jensen Inequality. In particular, if 
X has finite expectation, then it follows from Problem 8 that E(X | G) is a.s. 
finite, and so (23.6) holds for almost every w. 

For a further interesting inequality we assume that E(X) is finite. By taking 
expectations of both sides of (23.6) we obtain the first inequality below, and 
by applying the (unconditional) Jensen Inequality to the random variable w ~~ 
yo E(X | G)(w) and then using Problem 8 we obtain the second inequality: 


(23.5) 


(23.7) E(poX) > E(po E(X | G)) > vp(E(X)). 


Problem 20. State (23.6) and (23.7) for the special case y(x) = |x|. For each of 
the three inequalities obtained, decide whether it is an inequality previously known 
to you and, if so, on what basis. 


23.3. FORMULAS FOR CONDITIONAL EXPECTATIONS 451 


Problem 21. State conditional versions of some of the main results in Part 2 con- 
cerning independent random variables. Possibilities include conditional versions 
of the Strong Law of Large Numbers, the Kolmogorov 0-1 Law, and the Etemadi 
Lemma. Review the proofs of the unconditional versions of these theorems to verify 
that they can be adapted to the conditional setting. 


23.3. Formulas for conditional expectations 


There are various special cases in which one can make explicit calculations in- 
volving conditional expectations. The following are designed to introduce the 
reader to the more useful and instructive of these. 


Problem 22. From which problem in Chapter 21 does it follow that E(X |G) = X 
if X is G-measurable? 


* Problem 23. From which problem in Chapter 21 does it follow that E(X | G) 
E(X) if X and G are independent? 


Problem 24. Let G be a purely atomic o-field generated by a partition (Cj: j = 
1,2,...) of a probability space (Q, F, P), and let X: Q + R be a random variable. 
Then 
E(XX|G)= So E(X| Cio; a.s., 
j: P(Cj)>0 


where E(X | C) = E(X; C)/P(C) for events C such that P(C) > 0. 


Problem 25. Let X be a Cauchy random variable. Find a purely atomic sub-o-field 
G of a(X) such that each of the following four events has positive probability: 
(i) {w: E(X | G)(w) €R}; 
(ii) {w: E(X | G)(w) = oo}; 
(iii) {w: E(X | G)(w) = -ce}; 
(iv) {w: E(X | G)(w) is undefined}. 


The following result is a nontrivial improvement of Problem 22. See also 
Problem 27, which is an important special case. 


Proposition 5. For i = 1,2, let X; be a (W;,H;)-valued random variable 
defined on a probability space (Q), F, P) and let G be a sub-o-field of F such that 
Xə is measurable with respect to G. Suppose that each (Y;, Hi) is a Borel space. 
Let y be a measurable R-valued function defined on (Y1, H2) x (Y2, H2). If Qı 
is the distribution of X1, then 


(23.8) Boon |G) = l PE D a, 


1 


for almost every w, in the sense that the set of w for which one side exists but 
the other does not is a null event. 


23. CONDITIONAL EXPECTATIONS 


Problem 26. Prove the preceding proposition. Hint: Use Proposition 12 or Corol- 
lary 13 of Chapter 21. 


Problem 27. Let X and Y be R-valued random variables on a probability space 
(Q, F, P) and suppose that Y is measurable with respect to a sub-o-field G of F. 
Prove that 


E(XY |G) =YE(X |G) as., 


making sure that your proof shows that the set on which one side exists and the 
other does not is a null event. 


Problem 28. Suppose that X and Y are two R-valued random variables that are 
conditionally independent given some o-field G. Show that 


E(XY |G)(w) = E(X |G)(w) E(Y | G)(w) 
for almost every w for which both factors on the right side are finite. 


Problem 29. For conditional expectations, state and prove an analogue of The- 
orem 9 of Chapter 4, the result that describes basic properties of expectations 
such as linearity. Note that linearity and positivity have already been covered in 
Problem 2 and Problem 3. 


Problem 30. Let the random vector (X,Y ) be uniformly distributed on the triangle 
{(z,y):0<a2<y< 1}. Fix a number b between 0 and 1. Let A, = {w: X(w) < 
b < Y(w)}. Calculate 

E(Y — X | Ap) 
in the following two ways. One way is to find the conditional distribution of (X,Y) 


given A, and integrate y-— x with respect to it. A second way is to interpret (X,Y) 
as order statistics and use linearity of conditional expectation either in the form 


E(Y — X | Ay) = E(b — X | Ap) + E(Y —b | Ap) 
or in the form 
E(Y — X | Ay) = E(Y | 4) — E(X | Ao). 


Compare your answers for various values of b and also with the (unconditional) 
expectation E(Y — X). Discuss intuitively. 


We conclude this section with an important formula involving iterated condi- 
tional expectations. 


Proposition 6. Let X be an R-valued random variable on (Q,F,P) and 
denote by G and H two sub-o-fields of F, with H CG. If E(X | H) exists a.s., 
then E(X | G) exists a.s., and 


E(E(X |G) | H) = E(X | H) = E(E(X | H) | G) as. 


23.4. CONDITIONAL VARIANCE 453 


Problem 31. Prove the preceding proposition. Hint: If E(X?) < oo, the result is 
a consequence of Problem 1 of Chapter 20 and Problem 10. Use the Conditional 
Monotone Convergence Theorem to extend to nonnegative X, then (carefully) use 
linearity. 


Problem 32. Let (Q, F, P) be a probability space, and let G, H be o-fields such 
that H CG C F. Show that for any event A € F, E(P(A|G)|H) = P(A|H). 


* Problem 33. Let H C G and B € G. Use Proposition 6 to prove 
(23.9) E(E(X | GUIs | H)(w) = E(XIes | H)(w) 


for almost all w for which the right side exists. In particular, E(X | G) exists 
almost surely on the set where E(X | H) exists. 


Problem 34. Let X = (Xn: n = 0,1,2,...) be a Markov sequence taking values 


in a Borel space Y. Show that for any measurable function f: ¥ > R` and 
nonnegative integers k and m < n, 


E(E(f o Yn4r | Yn) | Ym) = E(f 0 Ynyr | Ym) as. 


23.4. Conditional variance 
In this section we will see that, on average, conditioning lowers variances. 


Definition 7. Let X be an R-valued random variable defined on a proba- 
bility space (Q, F, P), Q the distribution of X, and G a sub-o-field of F. The 
conditional variance of X given G is defined by 


Var(X | G)(w) = ji (æ — E(X | G)w))? Q(de | G)w), 
for every w for which E(X | G)(w) is finite. 


Proposition 8. For X and G as in the preceding definition, 
(23.10) Var(X |G) = E(X? |G) - [E(X | G)? as. 


Problem 35. Prove the preceding proposition, making sure to prove that the set 
where one side exists and the other does not is a null event. 


By Proposition 5, another way to write (23.10) is 
Var(X | G) = E((X - E(X | G)? | G), 


provided that E(X | G) is almost surely defined. 

By Problem 8, the conditional expectation of a random variable is finite almost 
surely if the (unconditional) expectation is finite. The following theorem implies 
that the same is true for variances. 


454 23. CONDITIONAL EXPECTATIONS 


Theorem 9. If X is any R-valued random variable with finite second mo- 
ment, then 


Var(X) = E(Var(X | G)) + Var(E(X | G)) 
for any conditioning o-field G. 
ProoF. Taking note of the fact that the hypothesis implies that the random 


variable E(X | G) is finite almost surely and has finite expectation and, as a 
consequence, has a variance (not known initially to be finite), we calculate 


E(Var(X | G)) = E(E(X* | G) - [E(X | 9)}*) 
= E(E(X?|G)) — E([E(X | g))’) 
= E(X?) — [E(E(X | g))]° — Var(E(X | 9)) 
= E(X?) — [E(X)? — Var(E(X | G)) 
= Var(X) — Var(E(X | G)), 


the linearity used for the second equality being valid because the first term on 
its right is finite by hypothesis. O 


Problem 36. To what familiar fact does Theorem 9 reduce in case X is the sum 
of two independent random variables having finite second moment and G is the 
a-field generated by one of the two summands. 


Problem 37. Theorem 9 asserts that the expectation of a conditional variance is 
typically smaller than the (unconditional) variance. Give an intuitive explanation. 


Problem 38. Let Z = X + Y, where X and Y are iid with finite second moment, 
and set G = o0(Z). For X and G as defined here calculate each of the three terms 
in Theorem 9 in terms of quantities associated with the distribution of Z. 


Problem 39. Suppose that X and Y in the preceding problem are normally dis- 
tributed. Then the random variable Var(X | G) is a constant a.s—Why? Give 
an example that shows that this conclusion can fail if the normality assumption is 
dropped. 


Problem 40. Let (¥1,...,Ya) have a Dirichlet distribution. Use Theorem 9 to 
calculate Var(E("1 | Yo,..., Ya)). 


Problem 41. Let (X;: k = 1,2,...) be a sequence of independent R-valued random 
variables each having zero mean and finite variance. Let 


-Éx 
k=1 


and 
Fn "O(N ror An) 


for n > 0. Show that E(Sn+41 | Fn) = Sn and Var(Sn+41 | Fn) = Var(Xn+1) for 
each n. 


23.4. CONDITIONAL VARIANCE 455 


A random sequence (Sn) having the property mentioned in the preceding 
problem that E(Sn+1 | Fn) = Sp is called a martingale. Thus, R-valued random 
walks with expected step equal to 0 are martingales. Martingales will be treated 
in the next chapter. 

We have already seen that many results for ordinary expectations generalize 
to conditional expectations. A similar statement holds true for variances. In 
most cases, such generalizations are rather straightforward, both in statement 
and in proof. Here is an exception, the proof being slightly subtle. 


Proposition 10. [Conditional Chebyshev Inequality] Let X be an R-valued 
random variable defined on a probability space (Q,F,P), and let G be a sub- 
o-field of F. Then, for z > 0 and almost every w for which E(X | G)(w) is 
finite, 

r Var(X | G)(w) 


P[|X - E(X |G9)| > z | G] (w) 


PROOF. Fix w such that E(X | G)(w) is finite. Letting Q be the distribution 
of X, apply the (unconditional) Chebyshev Inequality to the probability measure 
Q(- | G) to obtain 


Var(X | G)(w) > 27Q({e: le — E(X | G)(w)| > 2} 1G) 


Rewrite the right side as 


2 | To(2, EX | G)(w)) Qlaz | 9)(w), 
where B = {(z,y): |z — y| > z}. By Proposition 5, this last expression equals 


2"E(Ip o (X, E(X | G)) | G) (w) 
=z P||X - E(X |6) > z |g]. O 


* Problem 42. Let Y be a random variable taking values between 0 and 1. Let 


(X1, X2,...) be a random sequence whose conditional distribution given Y is that 
of a sequence of Bernoulli trials with parameter Y. Show that for € > 0, 


d 


Problem 43. Show that the statement obtained by replacing E(X | G) by E(X) 
in the Conditional Chebyshev Inequality can be false. 


Xi tee + Xp 
n 


< EY0-Y)) 


=Y 
= ne? 


>E 


Problem 44. Show that the statement obtained by replacing E(X | G) by E(X) 
and Var(X | G) by Var(X) in the Conditional Chebyshev Inequality can be false. 


456 


23. CONDITIONAL EXPECTATIONS 


Problem 45. [Conditional Markov Inequality] Let X,Y be R-valued random vari- 
ables defined on a probability space (Q, F, P), and let f: R x R > (0,00) bea 
measurable function which is increasing in the second variable, in the sense that 
f(x,y) < f(z, y2) if yı < ye. Show that if G is a sub-o-field of F such that X is 
G-measurable, then for all z € R and almost every w, 


E(f 0 (X,Y) |G)&) 
P|Y >z|G|@)< Xe) 


Problem 46. [Conditional Cauchy-Schwarz Inequality] State and prove the Condi- 
tional Cauchy-Schwarz Inequality. 


PART 5 


Random Sequences 


458 PART 5. RANDOM SEQUENCES 


In the previous parts of the book, we have laid the theoretical foundations of 
modern probability theory. In the remaining parts, we build on these foundations 
to study what are known as ‘stochastic processes’. This part is concerned with 
discrete-time stochastic processes, also known as random sequences, and Part 6 
is concerned with stochastic processes in continuous time. 

We have chosen to give a thorough introduction to five types of random se- 
quences: martingales (Chapter 24), renewal sequences (Chapter 25), Markov 
sequences (Chapter 26), exchangeable sequences (Chapter 27), and stationary 
sequences (Chapter 28). We have already seen examples of each of these types, 
often without making explicit mention of the fact. For instance, every iid se- 
quence is Markov, exchangeable, and stationary, and iid sequences of 0’s and 
1’s are renewal sequences. Random walks are Markov, and when the steps have 
mean 0, they are martingales. 

Thus, much of what we have to say will be concerned with generalizing certain 
of the properties of iid sequences and random walks. But the theory goes far 
beyond mere generalization. By taking the various points of view represented 
by the five chapters in this part of the book, we will be led to ask and answer 
questions that did not arise naturally in the context of iid sequences or random 
walks. And we will encounter random sequences that have interesting behavior 
that is not possible for iid sequences or random walks. 


CHAPTER 24 
Martingales 


We will treat what many would regard as the most important type of random 
sequence, for it is both intrinsically natural and also a tool for treating other 
topics in probability. Martingales are particularly important in the study of 
Markov sequences and Markov processes, as will be seen later in this book. 

Martingales can be used to model ‘fair games’. In this context, an important 
theorem to be proved in this chapter says that the fairness of a game is not 
affected by a broad range of strategies that might be employed by a player. 
Other theorems in this chapter, treating the issue of convergence as time gets 
large, have the flavor of laws of large numbers. 

To illustrate the power of the theory, we give applications to a variety of 
models, including random walks, Polya urns, and some random sequences rele- 
vant to gambling theory. In an extended example at the end of the chapter, we 
determine the optimal strategy in a game known as ‘Red and Black’. 


24.1. Basic definitions 


It is reasonable to regard the following definition as a description of a sequence 
of fair games. This interpretation will become clearer as we present various 
examples throughout the chapter. 


Definition 1. A random sequence X = (Xo, X1, Xo,...) of R-valued random 
variables with finite mean is a martingale with respect to a filtration (Fn, n > 0) 
to which it is adapted if 


(24.1) E(Xn+1 | Fn) = Xn a.s. 
for all n > 0. 


By relaxing the equality (24.1) to one-sided inequalities, we obtain the follow- 
ing two related definitions. 


460 24. MARTINGALES 


Definition 2. A random sequence X = (Xo, X1, X2,...) of R-valued ran- 
dom variables with finite mean is a supermartingale with respect to a filtration 
(Fn n > 0) to which it is adapted if 


(24.2) E(Xnai | Fn) < Xn as: 
for all n > 0. 


Definition 3. A random sequence X = (Xo, X1, Xe,...) of R-valued random 
variables with finite mean is a submartingale with respect to a filtration (Fn, n > 
0) to which it is adapted if 


(24.3) E(Xn41 | Fn) > Xn a.s. 
for all n > 0. 


If the filtration, in any of the above definitions, is the minimal filtration for 
X, we often drop the phrase “with respect to the filtration ... ”. According to 
the preceding definition, an R-valued random sequence X adapted to a filtration 
(Fn,n > 0) is a submartingale with respect to that filtration if and only if for 
all n, 


E((Xn41 — Xn) | Fn) 2 0 as. 
and E(|Xn|) < oo. By Theorem 4 of Chapter 23 this is equivalent to 


(24.4) E((Xn41 — Xn); A) > 0 


for all A € Fy. 

A random sequence X is a supermartingale if and only if the random se- 
quence —X is a submartingale, and X is a martingale if and only if it is both 
a supermartingale and a submartingale. Because of the simple relationship that 
exists between sub- and supermartingales, we will usually state results for one 
of these two types of sequences (whichever seems more natural) and leave it to 
the reader to formulate the analogous result for the other type. Of course, every 
result concerning sub- or supermartingales applies to martingales. 


Problem 1. Show that if X is a submartingale with respect to some filtration, then 
it is a submartingale with respect to the minimal filtration. Give an example of a 
random sequence that is a martingale with respect to the minimal filtration but is 
not a martingale with respect to some other filtration to which it is adapted. 


* Problem 2. Let X be a random sequence of R-valued random variables having 
finite mean. Prove that X is asubmartingale with respect to a filtration (Fn, n > 0) 
to which it is adapted if and only if for all m,n > 0, 


Problem 3. Prove that E(Xn) < E(Xm+n) for all submartingales X and m,n > 0. 


24.2. EXAMPLES 461 


The inequality in the preceding exercise will often be used without comment, 
especially in the form of an equality in the case of martingales. 


24.2. Examples 


Martingales, submartingales, and supermartingales appear in a variety of con- 
texts, as illustrated by the following examples and exercises. More examples will 
appear in later sections. 


Example 1. [Polya urns, continued] The random sequences X and Y in Ex- 
ample 3 of Chapter 22 can be used in a natural way to define a martingale. 


Let 


n ~~ x 4 ae , 
the proportion of blue balls in the urn after n steps. It is easy to obtain the 
conditional distribution of V,41 given Fn: it assigns probability 


Xn(w) Xn(w) +e 
Xn(w) + Ynlw) Ege Neri Xn(w) + Yn(w) +e 


and probability 


Yala) to the value EEES = A 
Xnlw) + Ynlw) Xnlw) + Ynlw) +e 
By using this conditional distribution, we obtain the following expression for 
the conditional expectation of Vn+1 given Fn: 


XnlXntelt+YaXn _ 
[Xn t+Yn][Xn+Ynte) 7 


Thus the random sequence V is a martingale. 


Problem 4. Let S = (So = 0,51,...) be an R-valued random walk. Assume that 
m = E(S}) exists and is finite. Show that S is a supermartingale, martingale or 
submartingale with respect to the minimal filtration, depending on whether m is 
<, =, or 2 0. 


Historically, the study of martingales was motivated by questions concerning 
gambling. In modern times, gambling theory has developed into an important 
area within probability theory. The following example introduces some of the 
questions that interest gambling theorists. 


Example 2. [Red and Black] A roulette wheel in Las Vegas typically has 38 
positions. Two of these are colored green, eighteen of them are colored black, 
and the remaining eighteen are colored red. On any given play, the wheel is 
given a spin, then allowed to turn freely until it comes to a rest, at which point 
there is a mechanism that marks one of the 38 positions. The game is repeated 


462 24. MARTINGALES 


indefinitely, and we will assume that the marked positions form an iid sequence, 
each member of which is uniformly distributed among the 38 possibilities. 

One of the options available to a gambler is to bet an amount of money, known 
as a ‘stake’, on the black numbers. We denote the stake by s, and the amount of 
money available to the gambler for betting, known as her ‘fortune’, by f. The 
rules of casinos require that 0 < s < f. If a black position comes up, the gambler 
receives back the stake plus as much again. Otherwise, she loses the stake. Thus 
if she wins, her fortune becomes f + s, and if she loses it becomes f — s. For the 
roulette wheel described above, these two possibilities have probability 18/38 
and 20/38, respectively. Since there is nothing special about these quantities 
from a mathematical point of view, we will allow the probability of winning to 
be any number p in [0,1]. Typically, p < 1/2. 

In this simple model, we assume that the gambler has decided not to make any 
other types of bets or to play any other games. Thus, if we denote by Q(f,s,-) 
the distribution of the gambler’s fortune after betting a stake s from a fortune 
f, then 


Q(f, 8, -) Ei po f+s + (1 — p)of—s ; 


where ô, denotes the delta distribution at z. 

We wish to define a random sequence which represents the sequence of fortunes 
of a gambler whose is repeatedly betting on the black numbers, as described 
above. The only choice being made by the gambler is how much to bet each time. 
In our model, we want this choice to depend only on the history of outcomes 
of previous bets. To make this last requirement more precise, we consider a 
sequence of functions that constitute an allowable ‘strategy’ for this game. 

For each n > 0, let y¥n4; be a Borel measurable function from [0,00)"*? to 
(0,00) with the property that 


ahi Jo dissa) < fn 


for all fo, f1,- --, fn € [0,00). The sequence y = (y1, Y2,.--) is called a strategy. 
The quantity Yn+1(fo,---, fn) represents the stake that will be bet on the (n+1)* 
play of the game, given that the ‘initial fortune’ was fo and, for k = 1,...,n, 
the fortune after the k*® play was fp. Let 


(Q, F) = ([0, œ), B)™, 


where B is the Borel o-field of subsets of [0,00). For w = (fo, fi,...) € Q and 
n > 0, let Xn(w) = fn. By Theorem 3 of Chapter 22 there exists, for each initial 
fortune fo and strategy y, a probability measure P on (Q, F) such that 


Piga foe 


and, for each n > 0, Q(Xn, y(Xo,-.-, Xn), -) is a conditional distribution of Xn+1 
given o(Xo,..., Xn). 


24.2. EXAMPLES 463 


We now have a precise formulation of the situation we wanted to model. Note 
that we have actually defined a family of random sequences, one for each initial 
fortune and strategy. This family is referred to collectively as Red and Black. 

Suppose p < 1/2. Then for any initial fortune fp and strategy y, the mean of 
the conditional distribution given above is less than or equal to X,,, so that the 
sequence X = (Xo, Xj,...) is a supermartingale. When p < 1/2, this mean is 
strictly less than X,, and one could say that the game is ‘unfair’, in the sense 
that it favors the casino over the gambler. Supermartingales are often used to 
model games which favor the casino rather than the gambler. When p = 1/2, 
the sequence X is a martingale. Martingales are often called ‘fair games’. 

One of the chief concerns of gambling theory is that of finding an ‘optimal 
strategy’. The meaning of the word ‘optimal’ depends on the goals of the gam- 
bler. Suppose that she has chosen a certain amount g > 0, and that her only 
goal is to achieve a fortune which is not less than g. Let 


N(w) = inf{n > 0: X,(w) = 0} 


and 
G(w) = inf{n > 0: X,(w) > g}. 


These random variables are stopping times for the minimal filtration of X. No 
nonzero stakes may be bet after time N,soG=oifG>WN. 

The ‘gambler’s ruin problem’ is to determine for each fixed initial fortune and 
strategy, the probability of ‘ruin’, which is the probability that G > N. A related 
optimization problem is to find a strategy which maximizes the probability that 
G < N. After we have developed some tools, we will solve the gambler’s ruin 
problem for the case of constant bets, and in the last section of the chapter, we 
will solve the optimization problem just mentioned. 


Problem 5. In real gambling casinos, gamblers have more choices than our gambler 
in Red and Black. For example, in roulette, a gambler may bet a nonnegative 
amount on each position. For betting correctly on a given position, the gambler 
receives back the stake plus 35 times that amount. The bet made in Red and 
Black is equivalent to placing stakes of size s/18 on each of the black positions. 
One could also imagine other types of roulette wheels. Generalize the construction 
of Red and Black to take some of these possibilities into account. 


We continue with more examples of martingales. 


Problem 6. Let f: [0,1) — R be a measurable function whose integral with respect 
to Lebesgue measure A on (0, 1) is finite. For n > 0, let Fn be the o-field generated 
by the collection of intervals of the form [(k — 1)/2",k/2”), k =1,2,...,2”. Let 
(9, F, P) = ([0,1), B, X), and define 


Xn = E(f | Fn). 


464 24. MARTINGALES 


Show that the random sequence X is a martingale with respect to the filtration 
(Fn,n > 0). Find an explicit formula for each random variable X, in terms of f. 


The next exercise generalizes the example in the preceding problem, and in 
so doing, also makes it simpler in a sense. 


Problem 7. Let Z be an R-valued random variable with finite mean, defined on a 
probability space (Q, F, P), and let (Fn, n > 0) be a filtration of sub-o-fields of F. 
Prove that the random sequence 


(E(Z | Fo), E(Z | Fi),...) 


is a martingale with respect to the filtration (Fn, n > 0). 


We will find later that every uniformly integrable martingale takes the form 
given in Problem 7 (see Theorem 20). 


* Problem 8. Let (So = 0, S1, S2,...) be an R-valued random walk and let y be the 
characteristic function of the step distribution. (‘Characteristic function’ is defined 
in Chapter 13.) Fix u € R and define 


Yn = exp(iuSn)/(p(u))” 


for n > 0. Show that the real and imaginary parts of the random sequence (Yn: n > 
0) are martingales. 


Example 3. Let S be an R-valued random walk starting at 0 with steps 
having finite expectation. For n > 1 define 


Sn Sn+1 
A ee 


ae 


Note that, for all n, 
Gn = OD ph ny Aas . X ) 
where (X1, Xo,..-.) is the sequence of steps of S. It follows from Proposition 4 
of Chapter 21 and Problem 5 of Chapter 23 that 
5 “. Sni Sn+1 


7 i 1 
ENG, E A Dana = eat 
k=l k=1 


If we let Zn = (Sn/n) in the preceding example, then the last equation be- 
comes E(Zpn | Gn+1) = Zn+1, which is like a reversed version of the defining prop- 
erty of martingales. We are motivated to make some definitions. A decreasing 
sequence of o-fields is a reverse filtration. A random sequence Z = (Z1, Z2,...) 
is called a reverse supermartingale with respect to a reverse filtration (Gn, n > 1) 
if, for all n, Zn has finite mean, is measurable with respect to Gn, and satisfies 


E(Zn | Gri) < Zn+1 a.s. 


24.3. DOOB DECOMPOSITION 465 


A random sequence Z is called a reverse submartingale if —Z is a reverse super- 
martingale. A reverse martingale is a random sequence which is both a reverse 
supermartingale and a reverse submartingale. We have shown that the sequence 
(S1, 52/2, 53/3,...) is a reverse martingale with respect to the sequence of o- 
fields (Gn,n > 1). This fact will be used later in this chapter to give a nice proof 
of the Strong Law of Large Numbers. 


24.3. Doob decomposition 


The next problem shows that every random sequence of random variables having 
finite mean can be changed into a martingale by conditional centering. 


Problem 9. Let X = (Xo, Xı,...) be a sequence of R-valued random variables 
with finite mean, adapted to a filtration (Fn, n > 0). For n > 0, define 


Un = E(Xn — Xn-1 | Fg) 
and 


Resy SI 
k=1 


Let Yo = Xo. Show that the random sequence Y = (Yo, ¥i,...) is a martingale 
with respect to the filtration (Fn, n > 0). 


Theorem 4. [Doob Decomposition] Let X be a submartingale with respect 
to a filtration (Fn,n > 0). There exist unique random sequences Y and V such 
that 

(i) for alln > 0, Xn = Yn + Vn; 

(ii) Y is a martingale with respect to the filtration (Fn, n > 0); 
(ili) 0 = Vo < Vi S Va <... ; 

(iv) for all n > 0, Vn is measurable with respect to Fn—1. 


Because of property (iv), we say that the random sequence V in the preceding 
proposition is previsible with respect to the filtration (Fn,n > 0). Thus, the 
Doob Decomposition Theorem states that any submartingale can be written as 
the sum of a martingale and an increasing previsible random sequence. 


* Problem 10. Prove Theorem 4. Also state and prove an analogous fact concerning 
supermartingales. Hint: Use Problem 9. 


* Problem 11. Let S be a random walk on R starting at 0. Assume that E(5:) = 0 
and Var(S1) < oo. Show that the random sequence 


57 = (So, SY, Sž, xis a) 


is a submartingale. Find an increasing previsible sequence V such that S? — V is 
a martingale. 


466 24. MARTINGALES 


24.4. Transformations of submartingales 


We turn now to two results concerning different ways in which submartingales 
may be transformed into other submartingales. As usual, there are analogous 
results for supermartingales. The first result, Proposition 5, states that a convex, 
increasing function of a submartingale is itself a submartingale. Proposition 6 
says that both the sum and the maximum of two submartingales are submartin- 
gales. 


Proposition 5. Let X = (Xo, X1, X2,...) be an R-valued random sequence 
adapted to a filtration (Fn: n > 0). Let y be an R-valued function which is 
conver on an interval that contains the supports of the distributions of the random 
variables Xn, n > 0. Assume that E(|poXn|) < co for alin. Let Y be the random 
sequence 

(po Xo, yo Xi, poXə,...). 
If X is a martingale with respect to (Fn: n > 0), or if X is a submartingale with 
respect to (Fn: n > 0) and ọ is also increasing, then Y is a submartingale with 
respect to (Fn: n > 0). 


Problem 12. Use the Conditional Jensen Inequality (Example 3 in Chapter 23) to 
prove the preceding proposition. 


Problem 13. Repeat the first part of Problem 11, using Proposition 5. 


Proposition 6. Let X and Y be submartingales with respect to a filtration 
(Fn,n > 0). Then the sequences 


(Xo + Yo, Xi +Y, X2 + Yo,...) 
and 
(Xo V Yo, Xi VY, Xe V Yo,...) 


are submartingales with respect to the filtration (Fn,n > 0). 


Problem 14. Prove the preceding proposition. 


Problem 15. Let X = (Xi, Xo,...) be a random sequence of independent R-valued 
random variables with common distribution function F. For n > 1 let Fa be the 
empirical distribution function based on n observations X1,...,Xn, and let 


Yn = sup{|Fn (£) — F(x)|: x € R}. 


Show that the random sequence Y = (Yn: n > 1) is a reverse submartingale with 
respect to an appropriate filtration. Hint: First, let y(y) = |y| and apply a reverse 
martingale version of Proposition 5 to the sequence (Fa (x) — F(x), n > 1) for fixed 
x. Then extend and modify Proposition 6 to apply to the supremum of countably 
many reverse submartingales. Then complete the proof. 


24.5. ANOTHER TRANSFORMATION: OPTIONAL SAMPLING 467 


24.5. Another transformation: optional sampling 


A central feature of martingale theory concerns random subsequences obtained 
from martingales by ‘sampling’ them at an increasing sequence of stopping times. 
If X is a random sequence adapted to a filtration (Fn, n > 0), and if T is an al- 
most surely finite stopping time for the same filtration, then according to Propo- 
sition 9 of Chapter 11, Xr is a random variable which is measurable with respect 
to Fr. We may regard Xr as the value obtained by sampling the sequence X 
at the random time T. The o-field Fn is often interpreted as the information 
available at time n, so the assumption that T be a stopping time means that the 
decision (‘option’) to sample at time n must be based on information available 
at time n. If we sample X at successive random times Tọ < Ti < Ta <..., we 
obtain a new random sequence. The Optional Sampling Theorem, to be stated 
and proved below, implies that if the original sequence is a submartingale, then 
the new sequence obtained by sampling is also a submartingale, provided each 
of the sampling times satisfies certain conditions. 


Definition 7. Let X = (Xo, X1,...) be a random sequence of R-valued ran- 
dom variables with finite mean. A Z -valued random variable T that is a.s. 
finite satisfies the sampling integrability conditions for X if the following two 
conditions hold: 

(i) E(|Xr|) < ~; 
(ii) liminfi poe E(|Xml; [T > m])=0. 


Theorem 8. [Optional Sampling] Let To < Tı < To < ... be an increasing 
sequence of stopping times for a filtration (Fm, M > 0), and let X = (Xm: m > 
0) be a submartingale with respect to the same filtration. Assume that each Tn 
is finite almost surely and satisfies the sampling integrability conditions for X. 
Define 

Yn(w) = Xr, w) (w), 
and 
Gn = FT, . 
Then the random sequence Y = (Yo,¥i,...) is a submartingale with respect to 
the filtration (Gn, n > 0). 


PROOF. By Proposition 9 of Chapter 11, Y is an R-valued random sequence 
which is adapted to the filtration (G,,n > 0). Each random variable in the 
sequence Y has finite mean by the sampling integrability condition (i). Thus, it 
remains to show that 


E(Yn+1 | Gn) = Yrs 
or equivalently that 
E(Yn+1; A) > E(Yn; A) 
for all AE Gn. Fix n > 0 and A € Gn. For m = 0,1,2,..., let 


Baes AN = il. 


468 24. MARTINGALES 


By hypothesis, Tn is finite almost surely, so A is the disjoint union of the sets Bm 
and a null set. On the set Bm, Yn = Xm. Thus, by the Dominated Convergence 
Theorem, it is enough to prove that 


E(Yn41 ; Bm) > EA ms Bm) 


for all m. For an arbitrary integer p > m, 


E(Yn+1 ; Bm) 
p-l 
(24.5) = E(Xm; Bm) + >) E((Xt1 — Xi); Bm A [Tai > 4) 
l=m 


T E((Yn+41 — Xp); Bm [Tri > P|) . 
For each l, 
[T+1 > l] = VRE, < I E Ff. 
Also Bm € Fm C F; for l > m. For such l it thus follows from (24.4) that 
E((Xi41 = Xı) Ban [Tn4i > 1) >0. 
From (24.5) we then obtain 
(24.6) E(¥n41; Bm) > E(Xm; Bm) + E((Yn41 = ps [Ta+1 > p)) . 


By the sampling integrability condition (ii), we can let p > oo along an appro- 
priate subsequence so that 


E(|Xpl; Bm A [Tny > pl) < E(|Xpls [Tn4i > pl) > 0. 


Since 
Lit. 41 >p! > 0 a.s. 


as p — œ, it follows from sampling integrability condition (i) and the Dominated 
Convergence Theorem that 


lim E(Yn41 ; Bi N bere > pl) = 
po 


Thus the second expectation on the right of (24.6) converges to 0 as p —> œ 
along an appropriate subsequence, and the proof is complete. O 


In view of Problem 3 we see that if S < T are stopping times that satisfy the 
sampling integrability conditions for a submartingale X, then E(Xs) < E(Xr). 
This fact will be used without comment, often with S = 0. 


Problem 16. Show that the Optional Sampling Theorem can be improved some- 
what, in that |Xm| can be replaced by X} in the sampling integrability condition 
(ii). State and prove a version of the Optional Sampling Theorem for supermartin- 
gales, and incorporate a similar improvement into your hypotheses. 


24.5. ANOTHER TRANSFORMATION: OPTIONAL SAMPLING 469 


Sampling integrability condition (i) is clearly necessary in the Optional Sam- 
pling Theorem since submartingales are, by definition, sequences of random vari- 
ables with finite means. The following example shows that condition (ii) cannot 
be eliminated entirely. 


Example 4. [Double or Nothing] Let X be a random sequence for which the 
conditional distribution of Xn+1 given o(Xo,..., Xn) is the uniform distribution 
on the two-point set {0,2X,}. Let Xo = 1. Such a sequence is a special case of 
Red and Black. It represents the fortunes of a gambler who starts with 1 dollar, 
then stakes her entire fortune at each step on a fair bet. After each bet, the 
fortune either doubles or vanishes, each possibility occurring with probability Ł, 
It is easily checked that X is a martingale. Let 


T(w) =inf{m > 0: Xm(w) = 0}. 


We leave it to the reader to check that T is of geometric type and, in particular, 
that T is finite almost surely. Since E( Xr) = E(0) = 0 < 1 = E(Xo), T does 
not satisfy the sampling integrability conditions for X. It must be condition (ii) 
that fails, because condition (i) is clearly satisfied. 


Problem 17. Show directly that sampling integrability condition (ii) fails in the 
preceding example. 


The sampling integrability conditions are somewhat technical in nature. They 
are chosen to make the proof of the Optional Sampling Theorem work. Here are 
some results and exercises concerning sets of conditions which imply the sam- 
pling integrability conditions. They are often easier to check than the sampling 
integrability conditions themselves, and they apply in a number of important 
situations. 


Proposition 9. [fT is an almost surely bounded stopping time, then T satis- 
fies the sampling integrability conditions for any sequence X of R-valued random 
variables with finite mean. 


PROOF. Suppose that T < a a.s. for some real number a. Then condition (i) 
follows from the following inequality: 


Xr] < 5 |X] a.s. 


1=0 


Condition (ii) follows from the fact that, form >a, P[T >m]=0. O 


470 24. MARTINGALES 


Problem 18. Let X be a submartingale and T a stopping time, both with respect 
to a filtration (Fn, n > 0). Define a random sequence Y by 


Yn (w) = Xr(wyanw)- 


By the Proposition 9 and the Optional Sampling Theorem, Y is a submartingale 
with respect to the filtration (Fr~n,n > 0). Prove that for all integers p > 0 and 
all sets A E€ Frap, 

E(Yp; A) < E(Xp; A). 


If, in addition, X is a martingale with respect to the filtration (Fn, n > 0), show 
that 
E(Yp|; A) < E(| Xp]; A). 


Problem 19. Let X = (Xo, X1,...) be an R®-valued random variable, and T an 
almost surely finite Zt-valued random variable. Show that if there exists a real 
number c such that for almost every w, 


max{|Xo(w)|, |Xi(w)|,---5 [Xr WII} Se, 
then T satisfies the sampling integrability criterion for X. 


* Problem 20. Show that if 0 < Xo < Xi < X»2..., then (i) implies (ii) in the 
sampling integrability conditions. 


Proposition 10. Let X be a random sequence of R-valued random variables 
with finite mean, adapted to a filtration (Fn, n > 0), and let T be an almost 
surely finite stopping time for the same filtration. Suppose that for each n > 0, 
there exists a finite constant Mn such that 


E(|Xn = Xn-1| | Fn—1) (w) < Mn 


for almost all w in the set [T > n]. Let 


fin) = domi. 


If E(f oT) < œ, then T satisfies the sampling integrability conditions for X. 


PROOF. For n > 0, let 


Yn = [Xo] + XC [Xi — Xil. 


tot 
Then Y, > |X,,| for all n, so it is enough to show that T satisfies the sam- 


pling integrability conditions for the random sequence Y = (Yo, ¥1,...). By the 
Monotone Convergence Theorem, 


(24.7) E(¥r) = E(\Xol) + $L E(Xn - Xn-1l; [T > n). 


n=l 


24.5. ANOTHER TRANSFORMATION: OPTIONAL SAMPLING 471 


Since [T > n] = [T < n — 1]° € F,-1, it follows from Theorem 4 of Chapter 23 
that 

E(|Xn — Xn-1|; [T > n]) < mn PIT > n]. 
Sampling integrability condition (i) follows from E(|Xo|) < œ, (24.7), and the 
following computation: 


oo œo Uk 
S mP Sn) = 5 3 mer ==) >) marl E= 
n=1 n=lk=n k=1 n=1 


=> JRP == ETA: 


Since Y is a nonnegative increasing random sequence, condition (ii) now fol- 
lows from Problem 20. 0O 


Problem 21. Show that the hypotheses of the preceding proposition are satisfied 
if E(T) < œ, E(|Xol) < œ, and [Xn — Xn-1| < bas. for some finite b and all 
n> 0. 


Problem 22. Let S be a random walk on R whose steps have finite mean. Prove 
that if T is a stopping time for which E(T) < oo, then T satisfies the sampling 
integrability conditions for S. 


Proposition 9 and Proposition 10 both impose conditions on T that ensure 
that the sequence Y does not get sampled too late. It turns out that if the 
sequence X is a uniformly integrable submartingale and if T is an almost surely 
finite stopping time, then T cannot be too late. In order to prove this result, we 
need a lemma concerning uniformly integrable collections of random variables. 


Lemma 11. Let {X1,t € T} be a uniformly integrable collection of R-valued 
random variables defined on a common probability space, and let (A;, Ag,...) be 
a sequence of events which converges to the empty set. Then 


(24.8) lim sup E(|Xi}; An) = 0. 
PROOF. Fix ¢ > 0. By the definition of uniform integrability, there exists a 
finite constant m such that, for all t€ 7, 
E(|Xt|; B) <e, 
where B; = [|X| > m]. Thus 
E(|Xi]; An) <E +mP(An N Bi) <e+mP(A,) 


for all t € T. Since A, —> Q, it now follows from the Continuity of Measure 
Theorem that E'(|X¢|; An) < 2e for all sufficiently large n and all t € T. Since 
€ is an arbitrary positive quantity, the proof is complete. O 


472 24. MARTINGALES 


Theorem 12. Let X be a uniformly integrable submartingale and T an al- 
most surely finite stopping time, both with respect to a filtration (Fn,n > 0). 
Then T satisfies the sampling integrability conditions for X. 


PROOF. Sampling integrability condition (ii) follows from Lemma 11 with 
A, =[T >n]. 

In order to verify condition (i), we define a random sequence Y by letting 
Yn = Xran for n > 0. By Proposition 9 and the Optional Sampling Theorem, Y 
is a submartingale with respect to the filtration (F7,~n,n > 0). It is enough to 
show that Y is uniformly integrable, since condition (i) then follows immediately 
from the Uniform Integrability Criterion and the fact that Y, > Xr as. as 
n — oo. Use the Doob Decomposition Theorem to write X = Z + V, where Z 
is a martingale and V is an increasing previsible random sequence, both with 
respect to the filtration (Fn,n > 0). Let Væ = lim, Vn. By the Monotone 
Convergence Theorem, 

E(Voo) = lim E(V,) = lim E(Xn — Zn) 
n—> oo 


n-o 
= lim [E(Xn) — E(Zo)] < E(| Zol) + sup E(|Xn]). 
n—o0 n 
By the uniform integrability of the sequence X, this last quantity is finite. It 
follows from the Uniform Integrability Criterion that the sequence V is uniformly 
integrable. Therefore Z = X — V is also uniformly integrable. Since 


Vr = lim Vran as., 
n> oo 


and since E(Vr) < E(V.), it also follows from the Monotone Convergence The- 
orem and the Uniform Integrability Criterion that the sequence (Vran, n > 0) 
is uniformly integrable. In view of the fact that Yn = Zran + Vran, we only 
need prove that the sequence (ZTan: n > 0) is uniformly integrable to finish the 
proof. 

Let z > 0. By Problem 18, 


E(|ZTanl; [|Zranl > 2]) < EUZnls [[ZTan] 2 2) 
< E(|Zal; [Za] > 2) + BUZal; [Zr] = 2])- 


By the definition of uniform integrability and Lemma 11, this last expression 
converges to 0 uniformly in n as z => œ, as desired. 1 


While proving Theorem 12, we have shown the following fact which will be 
useful later. 


Proposition 13. Let X be a uniformly integrable submartingale. Suppose 
that X = Z + V, where Z is a martingale and V ts a monotone previsible 
random sequence. Then Z and V are uniformly integrable. 


24.6. APPLICATIONS OF OPTIONAL SAMPLING 473 


* Problem 23. Let X be a submartingale and T an almost surely finite stopping 
time, both with respect to a filtration (Fn,n > 0). Show that if E(|X,|?) is 
uniformly bounded in n for some p > 1, then T satisfies the sampling integrability 
conditions for X. 


24.6. Applications of optional sampling 


We have obtained several results which aid in verifying the conditions of the 
Optional Sampling Theorem. It is time to look at some of the ways in which 
that theorem may be applied. 


Problem 24. Let S = (0, S1, S2,...) denote a random walk on R whose steps have 
mean 0. Set 

T = inf{n: Sn > 0}. 
Show that E(T) = oo. Hint: Assume that E(T) < oo, then apply Problem 22 and 
the Optional Sampling Theorem to obtain a contradiction. 


Problem 25. Let S be a random walk on Z starting at 0. Assume that 0 < 
E(\Si|) < oo. Let T be the time of the first return to 0. Prove that E(T) = œ. 
Hint: Let u = E(Sı) and consider the martingale 


(Sn —pn):n> 0) 
The preceding problem may also be relevant. 


* Problem 26. Let X be a supermartingale with respect to some filtration. Assume 
that Xo = fo and that 0 < Xn < g for all n. Show that for any almost surely finite 
stopping time T, 


Reise 
P[Xr = g] < ae 


The result of the preceding exercise is given an interpretation in the following 
example. 


Example 5. [Gambler’s ruin] Let X be a sub- or supermartingale. Assume 
that Xo = fo. We imagine that X, represents the fortune of a gambler after n 
bets. We will be mostly interested in the case in which X is a supermartingale 
(the game is unfavorable). As in Example 2, the gambler has the goal of achieving 
a fortune of at least g before going broke. We will assume that the gambler never 
bets more money than he has, so that X, > 0 for all n. We will also assume that 
he does not bet more than is needed to reach the goal, so that Xn < g for all n. 
Thus, X is bounded and the Optional Sampling Theorem can be used with any 
almost surely finite stopping time. Let 


T =inf{n > 0: Xn =O0or g}. 


Because of the assumptions made on the types of bets allowed, the gambler 
will stop betting at time T, which is a stopping time. He reaches the goal if and 


474 24. MARTINGALES 


only if T is finite and Xr = g. Depending on the strategy chosen, T may or may 
not be almost surely finite. For example, the gambler may choose a strategy 
in which all but finitely many of the stakes are of size 0. Nevertheless, in the 
supermartingale case, we can, for each n, apply the preceding problem to the 
bounded stopping time T An to obtain 


fo 

a" 

where p(fo,g) is the probability that the gambler reaches the goal g when the 
initial fortune is fp. Note that this bound depends neither on the strategy 
employed by the gambler nor on the particular game being played, provided 
that the sequence X is a supermartingale. Thus, it places an a priori limit on 
the best that the gambler can do in an unfavorable game. 

We will now show how to compute p( fo, g) for the case when the game is Red 
and Black, g and fo are integers (of course, satisfying 0 < fo < g), and the 
strategy is to bet 1 any time the fortune is strictly between 0 and g. Thus, up 
to the stopping time T, the sub- or supermartingale X describing the sequence 
of fortunes is a simple random walk, where the probability that any step equals 
1 is the probability p of winning a bet. We leave it to the reader to check that, 
for the strategy just described, T is almost surely finite and, therefore, 

E(Xr) 


(24.10) P(fo.9) = PIXr = 9] = B ele 


(24.9) P(fo,g) = lim P[XTan = 9] < 


If p = T X is a martingale and the Optional Sampling Theorem applies (for 
the same reason as in the solution of Problem 26) to give E(Xr) = fo. It then 
follows from (24.10) that p(fo,g) = fo/g. Therefore, in this case the a priori 
upper bound in (24.9) is achieved and the strategy of always betting 1 is seen to 
be optimal. 

When p # 5, we must use a different method to calculate p( fo, g), one which 
appears, in the present context, to be merely a clever trick. We will see in 
Section 3 of Chapter 26 that this trick is part of a general theory for computing 
certain probabilities for Markov sequences. We omit the trivial cases p = 0 and 
p = 1 and consider the sequence Y = (Yp: n > 0) defined by 


Yn = @/p) "5 


where q = 1 — p. Let (Fn, n > 0) be the minimal filtration for X. The sequence 
Y is adapted to this filtration, and, for each n, the increment Xn+1 — Xn is 
independent of Fa. It follows from Problem 27 and Proposition 5, both in 
Chapter 23, that 


E(¥n41 | Fal = E((q/p) E = Vn . 
Thus Y is a bounded martingale. By the Optional Sampling Theorem, 
E(Yr) = E(Yo) = (4/p)” . 


24.6. APPLICATIONS OF OPTIONAL SAMPLING 475 


We may also compute E(Yr) in terms of p( fo, g): 


E(Yr) = p(fo,9)(a/p)? + A - pl fo, 9) - 
Together, these two equalities imply that 


(q/p) —1 
(q/p)9 -1 


Note that even though this expression is undefined for p = h, it converges as 
p> 5 to the correct value for p = } —namely, fo/g. 


P(fo,g) = 


Problem 27. What general conclusion can be drawn from the preceding example 
about the optimization problem for Red and Black? 


Problem 28. For the preceding example, in the case that X is a simple (but not 
necessarily symmetric) random walk, calculate 


lim p(fo, 9) 
goo 
and interpret the results. 


The next result is useful in computing expected values connected with stop- 
ping times for random walks. 


Theorem 14. [First Wald Identity] Let S = (So = 0, S1, S2,...) be a random 
walk on Rwhose steps have finite mean. If T is a stopping time that satisfies the 
sampling integrability conditions for S, then 


(24.11) E(Sr) = E(Si)E(T), 
where we interpret 0- oo as 0. 
PROOF. Let u = E(S)) and, for all n, 
Yn = Sn- np. 


The random sequence Y = (Yn: n > 0) is a random walk whose step distribution 
has mean 0, so it is a martingale. It follows from Proposition 9 and the Optional 
Sampling Theorem that, for all n, 


0= E(YTan) = E(STan) — pE(T A n) p 
By the Monotone Convergence Theorem, 


(24.12) pE(T) =p lim E(T An) = lim E(SrTan). 
n> NCO 


476 24. MARTINGALES 


By the first sampling integrability condition, E(|Sr|) < oo. Thus, second sam- 
pling integrability condition (ii) and the Dominated Convergence Theorem imply 
that 


lim inf B(|Sran — Srl) < liminf B(|Sal + [Sr]; [T > nl) 
< lim inf E(|Sn] ; [T > n]) +limsup E(|Sr|; [T > n]) =0. 
Se N00 


Equation (24.11) follows from this equality and (24.12). O 


Remark 1. One might be tempted to prove the preceding theorem by ap- 
plying the Optional Sampling Theorem to the martingale Y and stopping time 
T. However, it is not easy to directly deduce that T satisfies the sampling inte- 
grability conditions for Y as a consequence of the fact that it satisfies them for 
S, although, as the reader is asked to show below, such an implication can be 
proved now that the above proof is complete. 


Problem 29. Use the First Wald Identity to show that if T satisfies the sampling 
integrability conditions for S having steps with a nonzero finite mean, then E(T) < 
oo. Then prove the implication mentioned in the last sentence of the preceding 
remark. 


The First Wald Identity is not useful for computing E(T) when E(S1) = 0. 
In this case, the following result may help. 


Theorem 15. [Second Wald Identity] Let S = (So = 0,51, S2,...) be a ran- 
dom walk on R whose steps have mean 0 and finite variance. If T is a stopping 
time that satisfies the sampling integrability conditions for the random sequence 
($22 m= 0; 1,2, 2:5), then 


(24.13) Var(Sr) = Var(S,)E(T). 


PARTIAL PROOF. Note first that the hypotheses imply that T satisfies the 
sampling integrability conditions for the sequence S. By the First Wald Identity, 
E(Sr) = 0 and hence Var(Sr) = E(S?). Let 


Ge na. Sly ae: 
By Problem 41 of Chapter 23, 
E(Zn+1 | a(So, ete Sa) =, Zn ) 


and so (Zo, Z1, Z2,...) is a martingale with respect to the minimal filtration of 
S. The rest of the proof is similar to the proof of the First Wald Identity. O 


Problem 30. Complete the proof of the preceding theorem. 


24.7. INEQUALITIES AND CONVERGENCE RESULTS 477 


Problem 31. Compute E(T) for Example 5 (the Gambler’s ruin example) in the 
case that X is a simple random walk on Z, with p € [0,1]. Compute the limiting 
value of your answer as g — oo and interpret the result. 


Problem 32. Let S be a random walk on R starting at 0, and let T be a stopping 
time. Assume that E(T) and E((S1)*) are both finite. Let p = E(S1). Prove that 


E([Sr — pT}? ) = Var(Si)E(T). 


* Problem 33. Let (Sn: n = 1,2,...) be a random walk in Z* with step distribution 
Q given by Q{0} = Q{1} = 5. Set T =inf{n: Sn = 1}, and T, = T An. Calculate 
E({Sr, — +Tn]?) and Var(Sr,) and describe the behavior of these quantities as a 
function of n. Discuss the relevance to Theorem 15. Also, calculate 


E(Var(Sr, | Tn)). 


Repeat the calculations with Tna replaced by T. 


Problem 34. Let S = (So = 0,51, S2,...) be a random walk on R, and let T be 
a stopping time. Prove that if E(|Si|) < œ and E(|Sr|) = œ, then E(T) = ov. 
Also, show that if E(S1) = 0, E(S?) < œ, and E(S?) = œ, then E(T) = œ. 


Problem 35. Show that the hypothesis on the stopping time T in the Second Wald 
Identity can be replaced by the hypothesis that E(T) < oo, and that the two 
hypotheses are equivalent if Var(.S,) > 0. 


24.7. Inequalities and convergence results 


We now turn our attention to some results that are similar to many of the 
properties discussed in Part 2 for sums of independent random variables. The 
Optional Sampling Theorem will play an important role in many of the proofs. 
We start with a result which can be used in much the same way as the Etemadi 
Inequality. 


Proposition 16. Let X be a submartingale. Then, for z > 0, 


1 
P| max Xx > 2] < 7 E(|Xal). 


PROOF. Let 
A= | max Xi > z] I 
Note that 
A = [XTan > z2], 
where 
T(w) = inf{k > 0: Xk (w) > z}. 
Then 


(24.14) zP(A) < E(XTan; A) < E((Xtan)*) ; 


478 24. MARTINGALES 


By Proposition 6, (X os k > 0) is asubmartingale, so by the Optional Sampling 
Theorem, 

E((Xran)") < E(X F) < E(\Xnl), 
which, in combination with (24.14), finishes the proof. © 


The previous proposition has many variations, of which the following is par- 
ticularly useful: 


Corollary 17. [Kolmogorov Inequality] Let X be a martingale. Then for all 
z>0 andp>1, 


1 
Ses P 
P aa R 


Problem 36. Prove the preceding corollary. 


Problem 37. Let X be a martingale with respect to a filtration (Fn, n > 0). As- 


sume that 


sup E(X,7) < œ. 
n>0 


Prove that the sequence X converges almost surely and in Lə to a random variable 
Y. Also show that E(Y | Fa) = Xn a.s. for all n > 0. Hint: First prove that X is 
Cauchy in Lə by showing that 


ae ea oo ean ae ene 


Then imitate the proof of Proposition 25 of Chapter 12 to get almost sure conver- 
gence. Problem 12 of Chapter 8 may be useful. 


One of the conclusions in the preceding exercise motivates us to extend the 
definition of martingale. We speak of a martingale (Xn: n = 0,1,2,...,00) 
adapted to a filtration (Fn: n = 0,1,2,...,00) where, in addition to the usual 
conditions, we require that Xə be F,.-measurable with finite mean and that, 
for each n, both Fn C Fə and 


(24.15) E(X | Fa) = Xn a.s. 


The extended definitions of submartingale and supermartingale are similar, with 
the equality in (24.15) being replaced by > and <, respectively. 

In Problem 37, we obtain such a martingale by setting Xə = Y. The assertion 
in that problem that Xn —> Y as n —> œ is a ‘martingale convergence theorem’. 
We will strengthen this result to a version in which there is no L2-hypothesis. 

In order to obtain such an improvement, we will need the notion of ‘upcross- 
ing’. Let X be a random sequence of R-valued random variables. For a < b 
define 

To(w) = inf{k > 0: Xw) < a}, 
and having defined Ton for n > 0, let 


Ton+i(w) = inf{k > Top, (w): Xk (w) > b} 


24.7. INEQUALITIES AND CONVERGENCE RESULTS 479 


and 
Ton+2(w) = inf{k > Ton4i(w): Xk lw) <a}. 


Informally speaking, the times Tn are the successive times at which the sequence 
X crosses the interval fa, b]. When n is odd, Tn is the time at which an upcrossing, 
is completed. For p > 0, let 


Up = t{n: n is odd and Tn < p}. 


Thus, Up, illustrated in Figure 24.1 for p = 15, is the number of upcrossings of 
the interval [a,b] made by the finite sequence (Xo,..., Xp). 


AMA 


ERG 


FIGURE 24.1. Instances of exactly two upcrossings by time 15 
(Uis = 2) 


Lemma 18. [Doob Upcrossing] Let X be a submartingale. Fix real numbers 
b >a. Then for all integers p > 0, 


E((Xp -0)+) 


b—a 


E(Up) < 


PROOF. It is easily checked that the crossing times T, defined above are 
stopping times with respect to the minimal filtration of X. Fix p > 0 and 
let Yn = XT,„^p for all n > 0. If k < Up, then Y>k-ı > b and Yo, < a, so 
(b—a) < Yor-1 — You. If k = Up, then Yək-ı > b and either Yo, < a or 


480 24. MARTINGALES 


Yo, = Xp >a. In either case, (b — a) < Yor_-1 — Yor + (Xp — a)*. If k > Uy, 
then Yo,_1 = Yor = Xp; SO Yop_1 — Yor = 0. Thus, 


OO p 
(b — a)Up < (Xp — a)* + S (Yarı — Yok) = (Xp — a)" + S (Yar-1 — Yo). 
k=l k=l 


By the Optional Sampling Theorem, the expected value of each term in the sum 
is nonpositive. Thus, the lemma follows by taking the expected value of both 
sides. O 


Theorem 19. [Submartingale Convergence] Let X be a submartingale. If 
lim inf E(|Xy|) < œ, 
noo 
then there exists a random variable Y such that E(|Y|) < œ and 


lim Xn =Y as. 
n—> 00 


Problem 38. Prove the previous theorem. Hint: Let Y = liminf Xn and Z = 
lim sup Xn. Use the Doob Upcrossing Lemma to show that for all real numbers 
a <b, 

P[Y <a andb < Z] =0. 


Conclude that Y = Z a.s. Then use the Fatou Lemma to prove that E(|Y]|) < œ. 


When Theorem 19 is applied to martingales it is usually called the Martingale 
Convergence Theorem. 


Problem 39. Prove that if X is a nonnegative supermartingale, then there exists 
a random variable Y with finite mean such that Y = lim X, a.s. 


Problem 40. Find an example to show that it need not be the case that E(Y) = 
lim E(Xn) in Theorem 19. Hint: Look back at the Double or Nothing example. 


The hypotheses in Theorem 19 say nothing about second moments, so, unlike 
Problem 37, there is no Ls conclusion. One might have expected an Lı conclu- 
sion, but the preceding exercise shows that an additional hypothesis is needed 
to get such a conclusion. 


Theorem 20. Let X be a submartingale with respect to a filtration (Fn, n = 
0,1,2,...). Then X is uniformly integrable if and only if there exists a random 
variable Y such that 


(24.16) lim E(|X, -Y| =0. 
N—? OO 
Furthermore, when (24.16) holds, 
Y= lim Xn as. 


n— CO 


24.7. INEQUALITIES AND CONVERGENCE RESULTS 481 


and, with Xœ defined to equal Y, (Xn: n€ Z ) is a submartingale with respect 
to (Fr w= 01,2 o's 560), where: Fog = Oo Fan 0). 


PROOF. First assume that (24.16) holds for some random variable Y. For 
sufficiently large n, 


E(\Y|) < E(|Xn - Y|) + E(Xn|) < œ, 
so E(|Y|) < oo. Since 
E(\Xn|) < EX, - Y|) + EY), 


it follows that X satisfies the hypotheses of the Submartingale Convergence 
Theorem. Thus, there exists a random variable Z such that Z = lim Xn a.s. 
Since X, converges to Y in Lj, we must have Y = Z a.s. It now follows from 
the Uniform Integrability Criterion that X is uniformly integrable. 

Now assume that X is a uniformly integrable submartingale. Then E(|Xn]) 
must be bounded in n, so the conditions of the Submartingale Convergence 
Theorem hold. Thus, there exists a random variable Y with finite mean such 
that Y = lim X,, a.s. By the Uniform Integrability Criterion, E(| Xn — Y|) > 0. 

It remains to prove that E(Y | Fn) > Xn when the equivalent conditions 
of uniform integrability and (24.16) hold. Fix A € Fn for some n > 0. Then 
E(|Y — Xm|; A) > 0 as m > ov, and so 


E(E(Y | Fn); A) = E(Y ; A) = lim E(Xm; A) 
= lim E(E(Xm | Fn); A) > E(Xn; A). 
mM- 
It follows from Problem 30 of Chapter 8 that E(Y | Fn) > Xn a.s., and the 
proof is complete. O 


Corollary 21. A sequence X of R-valued random variables is a uniformly 
integrable martingale with respect to a filtration (Fn, n > 0) if and only if there 
exists a random variable Y having finite expectation such that 


BY | Fn) = Xn G8. 
for alln € Z*. Furthermore, when this condition holds, then 
E(Y | Fæ) = lim Xn a.s., 
n— CO 


where 


Fig =O Figo 0) 


* Problem 41. Prove the preceding corollary. 


* Problem 42. Apply Theorem 20 to the martingale in Example 1. 


482 24. MARTINGALES 


Example 6. Let X be a sub- or supermartingale. Let us imagine that X 
represents the fortune of a gambler. In the real world, fortunes are measured 
in some indivisible monetary unit (such as pennies). Actual fortunes are also 
bounded by some number b representing the total amount of wealth in the world. 
Thus, even if we allow the gambler to borrow money, we may assume that X is 
a sequence of random variables which take integer values in the interval [—b, b] 
for some b > 0. Since X is bounded, it is uniformly integrable. By Theorem 20, 
lim X, = Y a.s. for some random variable Y. Since X is Z-valued, it follows 
that for almost all w, there exists a positive integer N(w) such that 


Y (w) = Xn(w) 


for all n > N(w). In other words, if the game can be represented as a martingale, 
submartingale, or supermartingale (that is, the game is either fair, favorable, or 
unfavorable), then no matter what strategy is employed, there must be a last 
play, provided that bets of size 0 are prohibited. The gambler cannot play 
forever. In practice, this last play occurs when the gambler cannot borrow any 
more money or the casino goes broke or the gambler ‘chooses’ to stop gambling 
(because of death or lack of time, for instance). If X is a supermartingale, then 
Theorem 20 implies that the expected value of the final fortune is less than or 
equal to the expected value of the initial fortune, just as we might expect. 


Problem 43. In what way must the preceding discussion be modified if we drop 
the assumption that money comes in indivisible units, but retain the assumption 
that fortunes are bounded by some finite quantity? 


Problem 44. For n > 0 and z € [0, 1), let I(n, x) be the unique interval of the form 
[(k = 1)/2", k/2") 
that contains x. Let À denote Lebesgue measure on [0,1). Prove that for any 


measurable R-valued function f defined on [0, 1), 


; 1 _ 
Pe A(I(n, £)) J PIAS 


for -almost all x € [0, 1) (see Problem 6). 


Theorem 22. [Reverse Submartingale Convergence] Let (Z1, Z2,...) be a 
reverse submartingale. Then there exists an R-valued random variable Y such 
that 


lim Zn = Y a.s. 
n> 0O 


24.7. INEQUALITIES AND CONVERGENCE RESULTS 483 


PROOF. Forn > 0, let Un be the number of upcrossings of the interval fa, b] by 
the sequence (Zo,..., Zn) as in the statement of the Doob Upcrossing Lemma. 
This differs by at most one from the number of upcrossings of the interval [a, b] 
by the sequence (X_n,X—n41,---,X0), where X_, = Z, for all k > 0. Let 
F k = Gp for k > 0. By the definition of reverse submartingale, 


E(X_n+1 | Faw) > Xn a.s. 


for all n > 0. Thus, except for the indexing set, the sequence X behaves like a 
submartingale, and we can apply the Doob Upcrossing Lemma to obtain 


E(Un) < 7~—E((Xo - 2)*). 


The right side of this expression is independent of n. Now follow the rest of the 
proof of convergence in the proof of the Submartingale Convergence Theorem 
(see Problem 38). O 


* Problem 45. Construct an example of a reverse submartingale whose limit is finite 
a.s. but does not have finite expectation. 


The next lemma shows that a reverse martingale cannot constitute a solution 
of the preceding exercise. 


Lemma 23. A reverse martingale is uniformly integrable. 


PROOF. Let (Zn: n € Zt) be a reverse martingale with respect to a reverse 
filtration (Gn: n € ZT). For each n let Yp = |Z,|. We will complete the proof 
by showing that this nonnegative reverse submartingale is uniformly integrable. 

For each n € Z* and each r € R*, let An,r = [Yn > r]. Since Yn < E(Yo | Gn) 
and An, € Gn, we have, for any m > 0, 


Eya: Anr) < E(E(¥ | Gn); An,r) = E(Yo; Ag) 


(24.17) < mP(Anr) + EY; [Yo > m])- 


By the Dominated Convergence Theorem, the second term on the right goes to 
0 as m — oo. Thus, to show that the right side (and therefore the left side) of 
(24.17) goes to 0 uniformly in n as r => oo, we only need show that P(An r) > 0 
as r — oo uniformly in n. By the Markov Inequality, 


P(An) <r E(VYn) <r 1 E(Y), 


the right hand inequality being a consequence of the reverse submartingale prop- 
erty. The rightmost term does not depend on n and goes to 0 as r > ow, as 
desired. O 


484 24. MARTINGALES 


Theorem 24. [Reverse Martingale Convergence] Let (Zn: n > 0) be a re- 
verse martingale with respect to a reverse filtration (Gn: n > 0). Then there 
exists a random variable Zæ such that the sequence (Zn: n > 0) converges to 


Zæ both in Ly and a.s. Moreover, (Zn: n = 0,1,2,...,00) is a reverse martin- 
gale with respect to the filtration (Gn: n =0,1,2,...,00), where 

[o.@) 

n=1 


Problem 46. Prove the preceding theorem. Hint: Use Lemma 23. 


Problem 47. Use the preceding theorem to prove the Strong Law of Large Numbers 
in the finite mean case. Hint: Refer back to Example 3. 


24.8. | Optimal strategy in Red and Black 


We complete this chapter by solving the optimization problem for the game Red 
and Black, Example 2. Denote by p the probability that the gambler wins a 
given bet. In the original example, p = 1, but any value of p will do for the 
first portion of this section and any value between 0 and 5 will do for the latter 
portion. We assume that the gambler’s only goal is to achieve a fortune of at 
least g, where g is some positive number. The probability that she reaches this 
goal is a function 
(x, 7) ~ H(z, 7) 

of the initial fortune x and the strategy y that she employs. We wish to find 
a strategy y which is optimal, in the sense that it maximizes II(xz, 7) for all z. 
More precisely, an optimal strategy y is one which has the property that for any 
nonnegative x and any strategy 7, 


II(z,7) < (2,7). 


An important class of strategies is that consisting of all stationary strategies, 
namely those strategies y such that 


Vit (fos fisstaada) = V1 (fn) 


for all n > 0. The stationary strategies are those for which the (n + 1)** bet 
depends only on the fortune after n bets. It should not be too surprising that in 
many settings optimal strategies are stationary. 


Problem 48. Let X be a random sequence of fortunes corresponding to some initial 
fortune fo and some stationary strategy y in the game of Red and Black. Prove 
that the sequence 


(I(Xo, y), 0(%X1, 7), II(X2, 7), es ) 


is a martingale with respect to the minimal filtration of X. 


24.8. OPTIMAL STRATEGY IN RED AND BLACK 485 


We will find that there is a stationary optimal strategy in the game of Red 
and Black. The following result, which is a special case of what is sometimes 
known as the Fundamental Theorem of Gambling, provides us with a criterion 
for recognizing such a strategy. 


Theorem 25. Let y be a stationary strategy with the property that yı (x£) = 0 
for all x > g. Then y is optimal for the goal of achieving a fortune which is at 
least g in the game of Red and Black if and only if for all nonnegative x and all 
s € [0, x], 


PROOF. First suppose that (24.18) fails for some z and s. Let r be the 
strategy (which is not stationary) defined by 


m(z)=s and nly) =y), y $r, 


and Tn = Yn for n > 1. Let X be the random sequence of fortunes corresponding 
to the initial fortune z and the strategy 7 in the game of Red and Black. Since 
the strategies 7 and y are identical after the first bet, 


I(z,7) = pll(x + s,y)+ (1 — p)I(z — s,y) > H(z, 7). 


Thus, y is not optimal. 

Now suppose that (24.18) holds for all nonnegative z and all s € [0,2]. Let X 
be a random sequence of fortunes corresponding to an initial fortune of fo and 
an arbitrary (not necessarily stationary) strategy 7, and let (Fn, n > 0) be the 
minimal filtration of X. By (24.18), 


E(1(Xn41,7) | Fn)) = pIl(Xn + Sn, Y) + (1 — py (Xn — Sn, 7) 
< (Xn, y), 


where Sn = Tn+1 (X0, ..., Xn). Thus, the sequence (I(Xn, y), n > 0) is a super- 
martingale. By the Optional Sampling Theorem, 


(24.19) I(fo, Y) = EM(XTan, 7) 
for all n > 0, where 

T(w) = inf{m > 0: Xm(w) > 9}. 
Clearly, E(X Tan, 7)) = P[Xran = g9], so, by (24.19), 


(fo, y) > P|XTan > g] . 
Let n — œ to obtain II(fo, y) > I(fo,7), as desired. O 


Problem 49. Generalize the preceding theorem to roulette games in which there is 
a greater variety of bets available to the gambler (see Problem 5). 


486 24. MARTINGALES 


If we are given a strategy y, Theorem 25 provides us with a way to check 
whether or not y is optimal. Unfortunately, it does not tell us how to construct 
an optimal strategy. For that task, we must make an educated guess. If our goal 
is to achieve a fortune of at least g in the game of Red and Black with p < t, 
our intuition tells us that a strategy calling for many small stakes is not likely to 
be optimal. We might guess that an optimal strategy involves only large stakes. 
Of course, we are not allowed to bet stakes which are larger than our fortune, 
and it seems reasonable that we would not want to bet more than is needed to 
reach our goal. These considerations lead to the stationary strategy y known as 
bold play: 


_JinN(g-fn) 0< fn<g 
mtot) = 1! TESS 
Theorem 26. In the supermartingale versions of Red and Black, bold play is 
an optimal strategy for achieving a fortune of at least g . 


PROOF. By a change of scale, if necessary, we may assume that g = 1. Since 
the case p = 0 is trivial, we may also assume that p > 0. Let A(x) = II(z,7), 
where y denotes bold play. In other words, h(x) is the probability that the 
gambler who starts with an initial fortune x reaches the goal by following bold 
play. By Theorem 25, to show the optimality of bold play it suffices to show 
that 


(24.20) h(x) > ph(z+s)+ (1-—p)h(z—- s), O<s<arK<l. 


Let us first determine some properties of the function h. It is obvious that 

h takes values in [0,1] and that h(0) = 0 and A(1) = 1. If the gambler has a 

fortune between 0 and 5 and plays boldly, her fortune doubles with probability 

p and vanishes with probability 1 — p. From this and a similar consideration 
1 


involving fortunes between 5 and 1, we deduce that 


(24.21) ee 
p+(1—p)h(2x-1) if; <a< 


cae ~ ifO<a<h 
1 
We will now prove by induction on n > 0 that (24.20) holds for x € [0,1] of 
the form 
k 


The case n = 0 is easily checked. Now consider the case n = 1. We must show 
that for all s € [0, 4], 


h(5) = ph(s + 8) + (1 — p)h(5 — 8). 


Nile 


24.8. OPTIMAL STRATEGY IN RED AND BLACK 487 
By (24.21) 


h(4) — ph(4 +s) — (1 - p)h(3 — 8) 
= p — p[p + (1 — p)h(2s)] — (1 — p)ph(1 — 2s) 
= p(1 — p) [1 — (h(2s) + A(1 — 2s))]. 


That this last expression is nonnegative follows from the fact in Problem 26 that 
h(x) < x for x > 0. 

We are now ready for the inductive step. Fix n > 1 and let z = k/2” for some 
positive integer k < 2”. Also, choose s € [0,2]. We will consider several cases. 
First, assume that x +s < 5. Then (24.21) and the inductive hypothesis imply 
that 


ph(x + s) + (1 — p)h(z — s) = p(ph(2z + 2s)) + (1 — p)(ph(2z — 2s)) 
< ph(2x) = h(x). 


The case in which x — s > ł is treated in a similar fashion. Now assume that 


TZ ; andz+s> Z, Then by (24.21) 
h(x) — ph(x + s) — (1 — p)h(z — s) 
= ph(2z) — A + (1 — p)h(2x + 2s — 1)] — (1 — pìph(2x — 2s) 


= p[h(2x) — p — (1 — p)h(2z + 2s — 1) — (1 — p)h(2x — 2s)]. 
In this case, it is necessary. true that z > + 7, since otherwise s < Ł and then 
xz +s < 4. Therefore, 2x > $, so further applications of (24.21) imply that the 


right re of the last i equals 


plp + (1 — p)h(4x — 1) — p — (1 — p)h(2z + 2s — 1) — (1 — p)h (2x — 2s)] 
= p(1 — p)[h(4z — 1) — h(2z + 2s — 1) — h(2z — 2s)] 
= (1 — p) [h (2x — 5) — ph((2x — 5) + (2s — $)) — ph((2a — s)— (2s — 5))]- 


Since p < $, the right side of this expression is bounded below by 


T O a a a a a a 


which is nonnegative, by the inductive hypothesis. The verification of (24.20) is 
similar for the case in which z > $ and z — s < 3. 

Thus we have checked (24.20) for x in a set which is dense in [0,1], so in 
order to show that (24.20) holds for all z € [0, 1], it is enough to prove that h is 
continuous. To show this, we will use the fact (Problem 50 following this proof) 
that h is increasing. In particular, at each z, the right and left limits h(z+) and 
h(x—) exist. Since we know that (24.20) holds for x in a dense subset of [0, 1], 
we can take limits from the left to obtain 


h(x—) > ph((x + s)—) + (1 — p)h( (x — 8)—) 


488 24. MARTINGALES 


for all 0< s <x < 1. Now let s N 0 to see that 
h(z—) > ph(xt+) + (1 - p)h(z-). 


Since h(x+) > h(x—) and p > 0, it follows immediately that h(z+) = h(x—), so 
h is continuous. O 


Problem 50. Prove that the function h of the preceding proof is increasing. Hint: 
Compare the fortunes of two gamblers who have different initial fortunes and who 
simultaneously follow the bold play strategy at the same roulette wheel. 


CHAPTER 25 
Renewal Sequences 


The main subject of this chapter is a class of random sequences that are defined 
in terms of random walks T = (Tm: m = 0,1,2,...) in Z" satisfying Tm+ı (w) > 
1 + Tm(w) (with the understanding that co > 1 + œ). The random sequence 
X = (Xn: n € Z”), defined by 


1 ifn = Tm(w) for some m > 0 


Anw l 


0 otherwise 


is known as the renewal sequence corresponding to T. The correspondence be- 
tween T' and X is one-to-one; the random walk T corresponding to a renewal 
sequence X is given by To(w) = 0 and, for m > 0, 


(25.1) Tm = inf{n > Tm-1: Xn = 1}, 


where, as usual, the infimum of the empty set is understood to be +00. 

In this chapter we will see that renewal sequences are important tools for 
studying random walks. In Chapter 26, they will be used in the analysis of 
Markov sequences. In physical applications, renewal sequences are used to model 
situations in which a task is repeated indefinitely, with the quantities T, rep- 
resenting the (integer) times at which the repetitions of the task begin. Our 
assumption that T be a random walk means that the lengths of time taken to 
complete the repetitions are iid. The integers n > 0 for which X,, = 1 are called 
renewal times, consistent with practice in renewal theory of using the noun ‘time’ 
for the target of the random walk and thus for the domain of the renewal se- 
quence. (Notice that we do not call 0 a renewal time even though Xo = 1 a.s.) 
If Tm < œ, then it is the mt” renewal time and the time difference Tmn — Tm—1 
is the m‘ waiting time. The distribution of the first renewal time T, is known 
as the waiting time distribution. This distribution is, of course, the same as the 
step distribution of the random walk T. 


490 25. RENEWAL SEQUENCES 


25.1. Basic criterion 


Suppose that X is a random sequence of 0’s and 1’s. One might ask if it is a 
renewal sequence, or equivalently, whether the sequence T defined by Tp = 0 and 
(25.1) is a random walk. The following proposition gives a useful criterion for a 
random sequence to be a renewal sequence, a criterion that does not involve the 
corresponding random walk. Roughly speaking, it says that renewal sequences 
are sequences of 0’s and 1’s that start over independently after each occurrence 
of a 1; that is, they ‘regenerate’ themselves. 


Proposition 1. A random sequence X = {Xo,X1, X2,...} of 0’s and 1’s is 
a renewal sequence if and only if Xo = 1 a.s. and 


PIX, =a, for Canc rs 


252 

) =P Xn =2n Jorans rP Xnr ta forr<n<rts] 
for all positive integers r and s and sequences (£1,...,£r+s) of 0’s and 1’s such 
that z, = 1. 


* Problem 1. Prove the preceding proposition. 


Note that if z, = 0, (25.2) does not necessarily hold for a renewal sequence 
(Xo, X1, X2,...). 
If X is a renewal sequence, the random set 


Mee tne Sa Ts 


is called a regenerative set in Z*. The study of renewal sequences is equivalent 
to the study of regenerative sets in Z*. 

The following result describes many regenerative sets connected with an ar- 
bitrary random walk in R. Figure 25.1 illustrates two of these sets, one located 
along the horizontal axis and the other located along the vertical axis in the 
graph of a random walk. 


Proposition 2. Let S = (So = 0,51, S2,...) be a random walk in R. Then 
each of the following conditions defines a regenerative set in Z*: 


( ) Sn = 0}; 

(25.4) Sn > Sk for0O<k <n}; 
(25.5) {n: Sn < Sk forO<k <n}; 
(25.6) Sn > Sk for0<k <n}; 
(25.7) Sn < Sk forO<k<n}. 


25.2. RENEWAL MEASURES AND POTENTIAL MEASURES 491 


If S is a random walk in Z, then each of the following conditions also defines a 
regenerative set in Zt: 

(25.8) {Sa Sy > Sp forO0<k <n}; 

(25.9) {- Sn: Sn < Sk forO<k <n}. 


FIGURE 25.1. The regenerative sets {n: Sn > Sk,0 < k< n} 
and 4 S755, > OROS bon 


Problem 2. Check that Theorem 12 of Chapter 11 applies directly to show that 
(25.3) is a regenerative set. Then prove that one of (25.4), (25.5), (25.6), and (25.7) 
is a regenerative set. Finally, show that one of (25.8) and (25.9) is a regenerative 
set. (The rest of the proof of the proposition will be omitted.) 


Later in this chapter, we will analyze some of the regenerative sets in Propo- 
sition 2. 


25.2. Renewal measures and potential measures 


Let (Xo, X1, X2,...) denote a renewal sequence. The random o-finite measure 
N given by 
N(B)=>— Xa 
neB 
is called the renewal measure of the renewal sequence X = (Xo, X1, X2,...); 
it describes the number of renewals occurring during a time set B. By the 


492 25. RENEWAL SEQUENCES 


Monotone Convergence Theorem, the function 


B ~ E(N(B)) = E(X Xn] 


neEB 


is a (nonrandom) o-finite measure; it is the potential measure of the renewal 
sequence X. The density of the potential measure U of X with respect to 
counting measure is the potential sequence of X. Thus, the potential sequence 
(uo, U1, U2,...) satisfies up = 1 and 


Un = U{n} = P[Xn = 1]. 


Let R be a waiting time distribution for a renewal sequence X corresponding 
to a random walk T in Z. For any set B C Z* and any nonnegative integer m, 
it follows from the definitions that 


R*™(B) = P[Tm € B], 


where R*°? is the delta distribution at 0. This observation will be used in the proof 
of the following result that relates the potential measure of a renewal sequence 
to the waiting time distribution. 


Proposition 3. The potential measure U and the waiting time distribution 
R of a renewal sequence are related via 


UB] > R"(B] BEZ", 


m=0 
where R*° is the delta distribution at 0. 


PROOF. By the Fubini Theorem, 


U(B) = EX )= 103 Ierm=n1) 
neB neB 
=) > Elit.=n)) 
m=0neEB 
— D ` R*™{n} = cae ` R*™(B) 


m=0 nEB m=0 


for all BC Z+. O 


We will explore further the relationship between a potential measure U and 
the corresponding waiting time distribution R by using generating functions. 
Consider a distribution Q on Zt. Recall that the probability generating function 
p of @ is given by 


(25.10) p(s) = > Q{n}s", O<s<l. 


25.2. RENEWAL MEASURES AND POTENTIAL MEASURES 493 


In order to extend the definition (25.10) to certain measures other than proba- 
bility measures, we remove 1 from the domain. The measure generating function 
of a measure U on Z* that has a bounded density (uo, ui,...) with respect to 
counting measure is the function 


OO 
(25.11) s~ Suns”, O<s< 1. 
n=0 


The following two problems extend essential parts of the theory of probability 
generating functions to measure generating functions. 


Problem 3. Suppose that U and V are measures on Z* with bounded densities. 
Show that if the measure generating functions of U and V are equal, then U = V. 


Problem 4. Let U and V be measures on Z*. Suppose that U has a bounded 
density (uo, ui,...) and that V is a finite measure. Define a measure U * V by 


(U*V){n} = $ urone, 


k=0 


where (vo, v1,...) denotes the density of V. Prove that U » V has a bounded 
density whose measure generating function is the product of the measure generating 
functions of U and V. 


The measure U * V, described in the preceding problem is called the convolu- 
tion of U and V; it can also be written with the finite measure first: V * U. 

When we speak of the measure generating function of a measure that happens 
to be a probability measure it is to be understood that the domain is [0, 1) rather 
than [0,1] and that the probability attached to the one-point set {oo} does not 
enter the formula for the generating function. The following theorem would fail 
to be true in general were we to allow s = 1. 


Theorem 4. Let R and U be the waiting time distribution and potential mea- 
sure, respectively, of a renewal sequence and let py denote the measure generating 
function of R. Then the measure generating function of U is 1/(1 — ọ). 


* Problem 5. Prove the preceding theorem. 


Theorem 5. Let R and U be the waiting time distribution restricted to Zt 
and potential measure, respectively, of a renewal sequence. Then 


U=(UxR) +60, 


where ôo denotes the standard delta distribution. 


494 25. RENEWAL SEQUENCES 


PROOF. By Problem 3, it suffices to prove equality of the corresponding mea- 
sure generating functions. By Theorem 4 the measure generating function of U 
is 1/(1 — w), where y denotes the measure generating function of R. So, the 
measure generating function of U * R is y/(1—y). Since the measure generating 
function of dp is the constant function 1, our task is to show that 


ee ee 


l~— y = i p 
But this follows immediately by simple algebra. O 


Problem 6. Rewrite Theorem 5 using the densities of U and R rather than U and 
R themselves. 


Theorem 6. For i= 1,2, let R; and U; denote the waiting time distribution 
and potential measure of a renewal sequence, and let p; denote the measure 
generating function of R;. Then the following three conditions are equivalent: 


(i) Ry = Ro ; 
(ii) Uj = U2 7 
(iii) pı = ye. 


PROOF. By Proposition 3, the first equality implies the second. By Theo- 
rem 4, the second equality implies 


T = (Gs) 
and hence yı = y2. That the third equality implies the first is the content of 
Problem 3. O 


25.3. Examples 


We begin with a series of exercises intended to help the reader become familiar 
with the concepts introduced so far. 


Problem 7. For p € [0,1] let X be a renewal sequence whose waiting time distri- 
bution R is given by 

R({2}) =p =1- R({1}). 
Calculate the potential sequence of X. 


* Problem 8. Show that the sequence (1,0, 4,4, 7,---) is a potential sequence and 
find the corresponding waiting time distribution. 


Problem 9. For which values of q is the sequence (1,0,q,9,9,...) a potential se- 
quence? For each such q find the density of the corresponding waiting time distri- 
bution. 


Problem 10. Describe all potential sequences beginning with two 1’s: (1,1,...). 


25.3. EXAMPLES 495 


Problem 11. For q € [0,1] let R be the waiting time distribution defined by 
R({1}) = q = 1— R({ow}). Find the corresponding potential sequence. De- 
scribe the corresponding regenerative set, minimizing the use of formulas in the 
description. 


Problem 12. For p € (0,1) let X be a renewal sequence with waiting time density 
n ~~ (1—p)p"”', n> 0. Calculate the corresponding potential sequence. 


Problem 13. Use a sequence of independent Bernoulli random variables to create 
the renewal sequence described in the preceding exercise. 


Problem 14. Let (Xo = 1, X1, X2,...) be the renewal sequence described in Prob- 
lem 12. Show that (1,1 — 1,1 — X2,...) is a renewal sequence and calculate its 
potential sequence. 


Problem 15. For p € (0,1) let (Yn: n = 1,2,...) be an iid sequence of random 
variables with 


P({w: Ya(w) =1}) = 1 — P({w: Yz(w) = -1}) =p. 


Let (Sn: n =0,1,...) denote the corresponding sequence of partial sums. In each 
of (i)-(vii) below, we define a random set © in Z*. Determine in each case whether 
X is a regenerative set. For those cases in which the answer is affirmative, calculate 
the corresponding potential sequence and waiting time distribution. 
(i) {O}U{n: Ya = 1}; 
Gi) {O}U{n:n=1and Yı = 1, orn > 1 and Yn = Yn-1 = 1}; 
(iii) {0} U {n: n = 1 and Yı = 1 or n > 1 and Yn Æ Yn-1}; 
(iv) {0} U {n > 1: — Yn-1 = Yn = 1}; 
(v) {0} U {n: n = 2 and Y = Yı = 1 or n > 2 and Yn = Yn-1 Æ Yn-2}; 
(vi) {0} U {n >2: —1 => Yhn = Yn—-1 A et 
(vii) {0} U {n > 0: Sp-1 < 0 and Sn = 0}. 
Hint: Partial fractions and the Binomial Theorem are useful tools when working 
with generating functions. 


The following theorem identifies another way in which renewal sequences 


arise—namely as term-by-term products of independent renewal sequences. 


Theorem 7. Let X = (Xo,X1,...) and Y = (Yo, Yı,...) be independent 


renewal sequences. Then the term-by-term product XY = (XoYo, X1Y1,...) is 
a renewal sequence the potential sequence of which is the term-by-term product 
(uovp,U1U1,-..) of the potential sequences (uo, ui,...) and (vo, v1,...) of X and 


PROOF. To show that XY is a renewal sequence we will apply Proposition 1. 


Thus, for positive integers r and s and a sequence (21,...,2Zr4s) of 0’s and 1’s 
such that z, = 1, we calculate 


(25.12) P|XnYn = zn forO<n<r+s]= 


SP BY, = land Xe Stan = in for 0 < meet ang 7); 


496 25. RENEWAL SEQUENCES 


where the sum is over all choices of sequences (zn) and (yn) whose terms equal 
0 or 1 and which have the property that zn = rnyn. Since X and Y are inde- 
pendent, each term in the sum can be rewritten as a product: 


P|X, = 1 and Xn = £n forO<n<r+snFr] 
. P[Y, = 1 and Yn = yn for 0 <n<r+s,nÆr]. 


Since each of X and Y is a renewal sequence, this product of two factors can, 
according to Proposition 1, be rewritten as a product of four factors: 


P[|X, =1 and Xn = zn for 0 <n < r] P[Xn-r = £n forr <n<r+s] 
-P[Y, = 1 and Y, = yn for 0 < n < r] P[Yn-r = yn forr <n<r+s]. 


Using the independence of X and Y again, we combine the first and third factors 
and the second and fourth factors to obtain 


PixX- =), Sand x: = tain = olor 0 << 7) 
P| Xie = 2 ak Yueh = yn tor? <a rs). 


Inserting this product into the summation (25.12) and then summing gives 


P[X,Y, = 1 and X;Yn = zn forO<n <r] 
- P|Xn-rYn-r = zn forr << Ps], 


thus completing the argument showing that XY is a renewal sequence. 
The value at n of the potential sequence of the renewal sequence XY equals 


PX 27231) 2 Pkg Py | Savas 


as desired. QO] 


Problem 16. Restate the preceding theorem using the language of regenerative sets 
rather than that of renewal sequences. 


Problem 17. Apply Theorem 7 to a variety of pairs of independent renewal se- 
quences. 


Problem 18. Theorem 7 asserts that the term-by-term minimum of two indepen- 
dent renewal sequences is a renewal sequence. Show that the term-by-term maxi- 
mum of two independent renewal sequences is not necessarily a renewal sequence. 


25.4. RENEWAL THEORY: A FIRST STEP 497 


25.4. Renewal theory: a first step 


Our goal in this section is to obtain information about the random variable 
N(B) for various sets B. In all cases, B will be a set of consecutive nonnegative 
integers, so N(B) will equal the number of renewals that occur during some time 
interval. 

Our first result concerns the case in which B is of the form {k+1,...,k +1} 
for some integers k,l > 0, and gives a formula for P[N(B) > 0], which is the 
probability that a renewal occurs between the times k + 1 and k+l. 


Theorem 8. Let R, (un: n = 0,1,2,...), and N denote the waiting time 
distribution, the potential sequence, and the renewal measure, respectively, of 
some renewal sequence. For k andl nonnegative integers, 


k 
P[N[k+1,k +1] >0] =>) R[k+1-n, k+l—n]un 
n=0 
k 
=1-) 0 R[k+1l+1—n,co]un 
n=0 
k+l 
= 5 Rik+l+1—n,olun. 
n=k+1 


PARTIAL PROOF. We will only prove the first of the three equalities. Denot- 
ing the random walk corresponding to X by T, we have 


P[N[k+1,k+1]>0] =) 0 PIT) <k,k+1 5 Ty Skt] 
7=0 

So SRR + 1-7, k+l—n] RR {n} 

j=0 n=0 

k 

So R[k+1—n,k+l—nlun, 


the last step being based on the Fubini Theorem. O 


Problem 19. Complete the proof of the preceding theorem by proving the other 
two equalities. 


Problem 20. Apply Theorem 8 to renewal sequences with geometrically distributed 
waiting times. Then arrive at the same conclusion by using Problem 13. 


Problem 21. Apply Theorem 8 to the renewal sequence of Problem 11. 


Problem 22. Prove the following corollary. 


498 25. RENEWAL SEQUENCES 


Corollary 9. For each k € Zt, 


k 
XO R[k + 1-7, 00] un = l. 
n=0 

The preceding result will be used in the proofs of Theorem 15 and Theorem 16 
in this chapter. 

We now turn our attention to the quantity N(Zt), which equals the total 
number of renewals made by the renewal sequence. The following result, which 
generalizes Corollary 13 of Chapter 11, shows how to determine the distribution 
of N(Z*+) when either U(Z*) or R{oo} is known. The proof is requested in 
Problem 23. 


Theorem 10. Let X be a renewal sequence with potential measure U and 
waiting time distribution R. If U(Z*) < œ, then the distribution of N(Z*) is 
of geometric type, supported by {1,2,...}, with mean U(Z*). If U(Zt) = œ, 
then N(Z*) = coa.s. In any case, 


UE) =a 


where it is to be understood that 1/0 = oo. 


According to the preceding result, the random variable N(Z7) either is finite 
a.s. or else it is infinite a.s. The following definition introduces terminology for 
these two cases. 


Definition 11. A renewal sequence and the corresponding regenerative set 
are transient if the corresponding renewal measure is finite almost surely and 
they are recurrent otherwise. 


Problem 23. Prove Theorem 10. 


Problem 24. Find a formula for the distribution function of the maximum member 
of a transient regenerative set in terms of its potential measure and waiting time 
distribution. In particular, check that your formula correctly gives the probability 
that the maximum equals 0. 


Our next theorem concerns the asymptotic properties of the random variable 
N({0,1,...,n}) as n > ow. 


Theorem 12. [Strong Law for Renewal Sequences] Let N, U, and p € [1, «] 
denote the renewal measure, the potential measure, and the mean waiting time, 
respectively, of some renewal sequence. Then 
_ U({0,1,...,n}) a N({0,1,...,n}) _ 1 


lim = li — a.s. 
NCO n NCO n H 


25.5. DELAYED RENEWAL SEQUENCES 499 


PROOF. Since N({0,1,...,n}) < n, the first equality is a consequence of the 
second equality and the Bounded Convergence ‘Theorem. 
To prove the second equality let T be the random walk corresponding to U. 


ae N({0,1,-.-.n Pw) 
n € [Tm—1(w); Tm(w)) = Sonoran = 


By the Strong Law of Large Numbers, Tm/M > u a.s. as m — oo. The second 
equality now follows easily. O 


s|s 


In terms of renewal sequences X, the preceding theorem says that 


n 
lim 2 k=0 Xk = 1 a.s. 
n—> CO n H i 


a fact justifying the term ‘Strong Law’. 

If X is transient, we already know from Theorem 10 that N(Z*) < œ as. 
and u = œ (since R{oo} > 0), so Theorem 12 does not tell us anything new in 
that case. 

If X is recurrent, Theorem 12 tells us that except for sample points w in 
some null event, the sequence X (w) has a well-defined ‘frequency’ of 1’s. This 
frequency is 0 if the mean waiting time is infinite; otherwise, it is the positive 
real number 1/y. It is useful to have terminology to distinguish these last two 
cases. 


Definition 13. A recurrent renewal sequence is null recurrent if the mean 
waiting time is infinite; otherwise, it is positive recurrent. 


After we prove the Renewal Theorem later in this chapter, we will obtain a 
criterion in terms of the potential measure U for distinguishing null recurrence 
and positive recurrence. 


25.5. Delayed renewal sequences 


Let S = (So = 0,51, 52,...) be a random walk in Z and for a fixed integer z 
consider the random sequence X = (Xo, X1, X2,...) defined by 


An = liz ON a 


Thus, the sequence X marks the visits made by S to z. If z # 0, X is nota 
renewal sequence. However, once the point z is reached by S, then thereafter X 
behaves like a renewal sequence. 

To make this idea precise and to introduce an appropriate name we consider 
an arbitrary random sequence X of 0’s and 1’s, and let T = inf{n > 0: Xn = 1}. 
Then X is called a delayed renewal sequence if either P[T = oo] = 1, or P[T = 
co] < 1 and conditioned on the event [T < oo], the sequence 


Y =(Xrinin=0,1,2,...) 


is a renewal sequence. The random variable T is called the delay time, and the 
distribution of T is the delay distribution of X. The waiting time distribution of 


500 25. RENEWAL SEQUENCES 


Y is also the waiting time distribution of X. Clearly a delayed renewal sequence 
X is a renewal sequence if and only if P[Xo9 = 1] = 1. 


Problem 25. For a simple random walk S in Z, let X be the delayed renewal 
sequence defined by Xn = I;,}°Sp for some z € Z. Find the probability generating 
function of the delay distribution of X and, from it, the probability that the delay 
equals oo. 


Problem 26. State and prove a version of Theorem 12 for delayed renewal se- 
quences. 


Problem 27. Prove the following analogue of Proposition 1: Let W = (Wn: n > 0) 
be a renewal sequence. A random sequence X = {X0, X1, X2,...} of 0’s and 1’s 
is a delayed renewal sequence with the same waiting time distribution as W if and 
only if 


P[Xn = zn forO<n<r+s] 
= P[Xn = £n for 0 < n < r] P[Wn-r = £n forr<n<r+s] 


for all positive integers r and s and sequences (z0,...,2%7r+s) of 0’s and 1’s such 
that z, = 1. 


There is an important technique, known as coupling that can be used to 
compare two delayed renewal sequences, provided they have the same waiting 
time distribution. Previously in this book, versions of this technique occurred in 
Theorem 12 of Chapter 11 and Proposition 5 of Chapter 14, although they were 
not identified as such. 

Let X and Y be independent delayed renewal sequences, both with waiting 
time distribution R. Let M = inf{n > 0: Xn = Yn = 1}, and define 


z Xn ifn<M 
"Vy, ifn>M. 


We will prove that the random sequence Z = (Z,: n > 0) has the same distri- 
bution as X. This result is particularly useful when M < œ a.s., because in 
that case, the two sequences Y and Z clearly have the same asymptotic behavior 
as n — œ, allowing us to conclude that X and Y have the same asymptotic 
behavior. 


Proposition 14. Let X,Y, and Z be as above. Then X and Z have the same 
distribution. 


PROOF. Fix an arbitrary finite sequence £o, %1,...,Zn of 0’s and 1’s. We 
must show that 


PiXG = ties A Ss Wa) Ho = Ps oe Sal 


25.5. DELAYED RENEWAL SEQUENCES 501 


Let M’ = nAinf{k: Yk = £k = 1}. Since X and Y are independent, X and 
(Y, M') are independent. It is easily checked that for w € [Zo = 20,..., Zn = £n], 
M'(w) =n A M(w). These two facts imply that 


Pi 25 = A E A A) 


n 

= ÑD PIM! = k, Zo = 20,...,Zn = £n] 
k=0 
n 


s3 PIM ak XG = foie k Chey ra ie = En] 
k=0 


n 
=P XG = Fig = 24] (S> PIM’ =k, Yeas =E at eee e Sgal) 
k=0 


Note that the event [M/' = k] is empty for k < n such that z, = 0. For such k 
we have by Problem 27 that 


PIM =k Vet = tetesa = En] 
(25.13) , 
= P|M = k]|P[W, = Ppa yee ys Wak =, 


where W = (Wn: n > 0) is a (nondelayed) renewal sequence with waiting time 
distribution R. This equality also holds trivially when k = n, since then it 
reduces to P[M’ = n] = P[M’ = n]. Substituting (25.13) into the expression 
preceding it gives 


P[Zo = oe = Er] 
= >. (Pm SPP Sonas k= GPW ee We = zn]) 
k=0 
By Problem 27, this last expression equals 
SP SAP NG = toga aea E a a | 
k=0 


as desired. O 


The previous result will be used in the next section in the proof of the Renewal 
Theorem. In the proof of the positive recurrent case of that result, the special 
delayed renewal sequences identified in the following theorem are particularly 
important. 


Theorem 15. Let R be a waiting distribution with finite mean u and let X 
be a delayed renewal sequence corresponding to R and the delay distribution D 
defined by 


1 
PE e nEZt. 


Then P|Xn = 1] = 1/p for all nonnegative integers n. 


502 25. RENEWAL SEQUENCES 


PROOF. By Corollary 20 of Chapter 4, D is a distribution, as asserted in the 
theorem. 

Let (uo, u1, U2,...) be the potential sequence corresponding to R (and thus 
to the corresponding undelayed renewal sequence) and let To have distribution 
D. By definition, P[X, = 1 | To = k] = un_x for k < n and = 0 for k > n. 
Hence, 


n 1 n 
PRs IERP S115) =) mna Eke 7 S "une R[k+1, 00) , 
k=0 k=0 


which by Corollary 9 and the fact that R{oo} = 0, equals 1/y:, as desired. O 


Problem 28. Let X = (Xo, X1, X2...) be a delayed renewal sequence of the form 
described in Theorem 15. Prove that the various sequences (Xz, Xk+1, Xk+2,---) 
all have the same distribution. [Thus X is an example of a ‘stationary sequence’, 
to be defined in Chapter 28. It is a ‘stationary renewal sequence’.| 


* Problem 29. For a delayed renewal sequence of the form described in Theorem 15, 
find a formula for the mean of the delay in terms of the mean (assumed to be finite) 
and variance (not necessarily finite) of the waiting time distribution. 


Problem 30. [Bus-stop paradox] Let X = (Xo, X1, X2,...) be a delayed renewal 
sequence as described in Theorem 15, and set V = min{n > 0: X, = 1}. Suppose 
that X represents the times at which buses come to a particular bus stop. Then 
V can be interpreted as the time one must wait for the next bus provided that one 
arrives at the bus stop just after time 0. Find necessary and sufficient conditions 
on the mean and variance of the waiting time distribution in order that 


E(V | Xo =0) > E(V | Xo =1). 


Thus when the requested conditions hold, the conditional mean of the time to wait 
for the next bus is less if one arrives just as a bus is pulling away from the bus 
stop than it is if one arrives with no bus in sight. Give an intuitive explanation 
that agrees with the mathematics and makes this seemingly paradoxical conclusion 
appear natural. If one goes to the bus stop just after some fixed positive (integral) 
time rather than just after time 0, are the conclusions the same? 


25.6. The Renewal Theorem 


Recall, from Theorem 12, that the potential measure U of a renewal sequence 
satisfies 

lim U({0,1,...,n}) 7 3 

n= o0 n H 
where u € [1, co] denotes the mean of the waiting time distribution. Thus, rela- 
tive ton, (n—1)/u and n/p are close to U ({0,1,...,n—1}) and U({0,1,...,n}), 
respectively, when n is large. By subtracting we arrive at the conjecture that 


1/p is a good approximation of un = U({n}) when n is large. Although this 


25.6. THE RENEWAL THEOREM 503 


conjecture is false as it stands, the next theorem gives a precise statement in 
this direction. For the statement we need a concept. The period of a renewal 
sequence is the greatest common divisor of the support of the corresponding po- 
tential measure, with the greatest common divisor being defined to equal oo in 
case the support equals {0}. It is easy to see that the period is also the great- 
est common divisor of the support of the waiting time distribution. A renewal 
sequence is aperiodic if its period is 1. 


Theorem 16. [Renewal] Let u € [1,00] and (un: n > 0) be the mean waiting 
time and potential sequence, respectively, of a renewal sequence with finite period 
y. Then 

lim uky = L 
k= co 

BEGINNING OF PROOF. In the transient case y = œ; also $ un < œ by 
Theorem 10 and, hence, un — 0. This completes the proof for the transient case 
and we hereafter restrict consideration to the recurrent case. 

Treating multiples of y is equivalent to treating all members of Z* for the 
renewal distribution Ê defined by R(B) = R(yB) and having mean u/y. There- 
fore, we may, without loss of generality, assume aperiodicity throughout the 
remainder of the proof. 

Clearly un > 0 if and only if n is a linear combination of members of the sup- 
port of R with Z*-valued coefficients. It follows from the forthcoming Lemma 18 
that un > 0 for all sufficiently large n. O 


For the remainder of the proof, we consider the positive recurrent and null 
recurrent cases separately. 


END OF PROOF IN POSITIVE RECURRENT CASE. By Theorem 15 there ex- 
ists a delayed renewal sequence Y = (Yp: n = 0,1,2,...) independent of X 
having waiting time distribution R and satisfying P[Y, = 1] = 1/p for all n. Set 


M(w) = inf{n: X,(w) = Y,(w) = 1}, 
and define Z = (Zn: n = 0,1,2,...) by 


Fix Xnlw) ifn < M(w) 
k Y,(w) ifn >M(w). 


By Proposition 14, X and Z have the same distribution. 

Our goal is to prove that M < œ a.s., for then it will follow that Yn — Zn > 0 
and hence that E(Yn) — E(Zn) = 4 — E(Xn) — 0, thus finishing the proof in the 
positive recurrent case. Let K = min{k: Yp = 1 and ux > 0}, a random variable 
that is finite almost surely because X is recurrent and uz > 0 for all sufficiently 
large k. By considering the random walk and delay time corresponding to the 
sequence Y, it is easy to see that X, K, and the random sequence (Yn+ x : n > 0) 
are independent, and that (Yn+x : n > 0) has the same distribution as X. Thus, 


by Theorem 7, the random sequence (X,YniKx: n > 0) is a renewal sequence that 


504 25. RENEWAL SEQUENCES 


is independent of K, and the corresponding potential sequence is {u?: n > 0}. 
The positive recurrence of X and Theorem 12 imply that un 4 0 and hence that 
$` u? = œ, so this renewal sequence is recurrent by Theorem 10. 

In other words, with probability 1 there exist infinitely many times n such 
that Xn = Ynix = 1. Let So = 0 < Sı < So,... be that random sequence of 
times. Since this sequence is strictly increasing, So,; +k < Sox(;41) for all k > 0 
and 7 > 0. Therefore, repeated applications of Proposition 1 imply that for each 
positive integer k, the events 


[Ask =A gies 


are independent and have probability u,. Since K is independent of X, the 
events 
[X sgt = 1], i a seers 

are conditionally independent given K, with conditional probability ux. By 
the definition of K, ux > 0, so the conditional version of the Borel-Cantelli 
Lemma implies that with probability 1, infinitely many of these events occur. 
Each such occurrence corresponds to a time at which the sequences X and Y 
are simultaneously equal to 1,so M < oo a.s. as desired. O 


END OF PROOF IN NULL RECURRENT CASE. For a proof by contradiction, 
suppose that for some £ > 0 there are infinitely many n for which un > €. 

Since u = œ, Corollary 20 of Chapter 4 implies the existence of an integer q 
such that 


q+ 9 
25. —. 
(25.14) X R[n, œ) > 

n=l 

Consider q + 1 independent delayed renewal sequences YC) 0 <r <q, with 

y() = X and, for 1 < r < q, Y”) having a nonrandom delay equal to r and 
having the same waiting time distribution as X. Loosely mimicking the proof 
for the case of positive recurrence, we set 


M(w) = inf{n : Y (w) = 1 for0<r <q} 


and define 
Y (w) ifn < M(w) 


u) = 
ek) ei ifn > M(w). 


As for the case of positive recurrence, here each Z”) has the same distribution 
as the corresponding Y‘"). Since un > £ for infinitely many n, >> uit) = OO, 
Hence, P[M < oo] = 1, as can be seen by using an argument similar to that 
used for the case of positive recurrence. Therefore, there exists a positive integer 
k > q such that 


up = P[X,=1J)>e and Pe =a. forl<r<q. 


25.6. THE RENEWAL THEOREM 505 


It follows that 
Ply” =1)> : 
that is, 


(25.15) Uk—r > : for0<r<q. 


From (25.14) and (25.15) we obtain 


q+1 
D Uk+1-nR|n, œ) > 1. 
n=l 


We have reached the desired contradiction since this last inequality is inconsistent 
with Corollary 9, as can be seen by a change of variables. O 


The following is an immediate consequence of Theorem 10 and the Renewal 
Theorem. 


Corollary 17. Let X be a renewal sequence with potential sequence (un: n > 
0). Then X is recurrent if and only if $ „un = œ, in which case X is null 
recurrent if and only if limpn+.0 Un = 0. 


We conclude this section with a proof of the number-theoretical fact used near 
the beginning of the proof of the Renewal Theorem. 


Lemma 18. Let B be a set of positive integers having greatest common di- 
visor equal to 1. Then there exists a positive integer k such that every integer 
n > k can be written as a linear combination of members of B with coefficients 
belonging to ZT. 


PROOF. Let H denote the set of linear combinations of members of B having 
coefficients in ZT, and set G = {n — k: k,n € H}. It is easy to check that G 
also equals the set of all linear combinations of members of B having coefficients 
in Z. The Euclidean algorithm gives a method of representing the greatest 
common divisor of two positive integers as a linear combination of those integers 
with coefficients in Z. Repeated use (needed only finitely many times even if 
#(B) = oo) of this fact shows that 1 is a linear combination of members of B 
having coefficients in Z. That is, 1 € G and so 1 = q — p for some members p 
and q of H. 

We will finish the proof by using mathematical induction to show that every 
n > pq can be represented as a linear combination of p and q having coefficients 
in Zt. It is clearly true if n = pg. Suppose, as the induction hypothesis, that 
k = ap + bq > pq, and a,b € Z+. Then 


k+1=(a-1)p+(6+4+1)q, 


506 25. RENEWAL SEQUENCES 


thus finishing the induction argument in case a # 0. The following calculation 
showing that the case a = 0 can be reduced to the case a # 0 completes the 
proof: 


k = bq = qp + (b—p)q, 
which is a linear combination of p and q with positive coefficient on p and non- 
negative coefficient on q. O 


25.7. + Applications to random walks 


We have already seen that there are several regenerative sets that are useful for 
understanding random walks in R. In this section, we will investigate properties 
of some of these sets. 

We begin by treating the zero set of an arbitrary random walk S = (Sn: n € 
Zt) in Z, that is, by treating the regenerative set defined in (25.3). The corre- 
sponding renewal sequence X is defined by Xn = Ito} ° Sn. The random walk S 
is called transient, positive recurrent, or null recurrent according to which phrase 
applies to X. 

The potential sequence (un: n > 0) of X is easily seen to be related to the 
step distribution Q of S by un = Q*"{0}. By Theorem 13 of Chapter 13 we 


obtain 
1 


T 
Un = — B"(v) de, 
2T Jor 
where @ is the characteristic function of Q. Multiplying by s” and summing 
gives 
1 TA n 4 1 
— = a — d 
1 — ¢(s) >: a re ee a 


where y is the measure generating function of the waiting time distribution of 
X. We state this conclusion formally. (The essence of this result is also found in 
Section 10 of Chapter 13. However, the corollary after the result adds something 
new that is not found in Chapter 13.) 


Theorem 19. Let S = (So = 0, S1, S2,...) be a random walk in Z, and let 
Tı = inf{n > 0: Sn = 0}. Then the distribution of Tı has measure generating 
function — given by 


ae 1 a 
ZEAE — d f 
oo z L 1 — sB(0) | 
where 3 is the characteristic function of the step distribution of S. 


Corollary 20. Except for the identically zero random walk, every random 
walk on Z is either transient or null recurrent. 


Problem 31. Prove the preceding corollary. 


25.7. APPLICATIONS TO RANDOM WALKS 507 


Problem 32. Apply Theorem 19 to each simple random walk in Z to determine 
transience and null recurrence, and to calculate the distribution of the first return 
time to 0 of the random walk. (Of course, the results of this problem have also 
been obtained earlier by other means. See, for example, Problem 33 of Chapter 5.) 


We now consider the regenerative sets (25.4) and (25.7) identified in Propo- 
sition 2 as arising from a random walk (So = 0, S1, 5S2,...) in R: 


{n: Sn > Sk for O<k <n}, 
the set of strict ascending ladder times; and 
{n: Sn < Sk for O<k <n}, 


the set of weak descending ladder times. Our first goal is to obtain a relation 
between the probability generating functions ytt and y~ of the waiting time 
distributions for the strict ascending ladder times and the weak descending ladder 
times. Here is the key tool. 


Lemma 21. Letn > 0. For a random walk in R the probability that n is a 
strict ascending ladder time equals the probability that there is no positive weak 
descending ladder time less than or equal to n. 


PROOF. The result is obvious for n = 0. Fix n > 0. Denote the partial sums 
and the steps of the random walk by Są and Yp, respectively. The event that n 
is a Strict ascending ladder time equals 


n 
SOY >O for m= 1,2, caf 


k=m 


By using the fact that the random variables Y; are independent and identically 
distributed we see that we can replace Yp by Yn+1-x for k = 1,2,...,n without 
changing the probability. Thus the probability that n is a strict ascending ladder 
time equals 


P wee >0 form= lasit ) 
k=1 
which is the probability that none of the integers 1,2,...,n is a weak descending 


ladder time. O 


Theorem 22. The probability generating functions ptt and y~ of the wait- 
ing time distributions for the regenerative sets of strict ascending ladder times 
and weak descending ladder times of a random walk in R satisfy the relation 


[1 -y+ (s) [1 -y7 (s)]=1-s, O<s<1. 


508 25. RENEWAL SEQUENCES 


PROOF. The equality is clear for s = 1, so we take s € (0,1). For n = 
0,1,2,..., let uï * denote the probability that n is a strict ascending ladder time 
and r3 the probability that n is the smallest positive weak ascending ladder 
time. Since [1 — ytt]~! is the measure generating function of the sequence 
(utt,n=0,1,...) and, according to the preceding lemma, uft = 1- } 1T}, 
we conclude that 


l = — —] an 
ore T 2b rls 
1 OQ oc 
S Pe 
k=1 n=k 
cd A _ sk 1—7 (s) 
Re A l-s = 


Corollary 23. Either the set of strict ascending ladder times and the set 
of weak descending ladder times are both null recurrent or one of these sets is 
transient and the other is positive recurrent. 


Problem 33. Prove the preceding corollary. 


Problem 34. Formulate results analogous to the preceding theorem and corollary 
for the regenerative sets (25.5) and (25.6) identified in Proposition 2: the sets of 
strict descending ladder times and of weak ascending ladder times. 


Problem 35. For random walks in R other than those are almost surely identically 
zero prove that the set of strict ascending ladder times and the set of weak ascending 
ladder times are both transient or both null recurrent or both positive recurrent. 


In case the step distribution of a random walk in R assigns zero probability 
to each one-point set, each weak ascending ladder time is, with probability one, 
a strict ascending ladder time; and similarly for descending ladder times. In this 
case we have 

[1 - ptt (s) [1 -77 (s)]=1-s, 
where 77 is the probability generating function of the waiting time distribution 
for the regenerative set of strict descending ladder times. Since ptt = yp ~~ for 
a symmetric random walk we obtain the following fact, remarkable in that the 
conclusion does not depend on further properties of the step distribution of the 
random walk. 


Corollary 24. The probability generating function of the waiting time dis- 
tribution for the regenerative set of strict (or weak) ascending ladder times of a 
symmetric random walk whose step distribution assigns zero probability to each 
one-point set is the function s~»1—VJ1—s. 


25.7. APPLICATIONS TO RANDOM WALKS 509 


* Problem 36. Calculate the potential measure and waiting time distribution for the 
regenerative set described in Corollary 24. 


Example 1. Suppose that the support of the step distribution of the random 
walk S = (Sn: n > 0) consists of integers smaller than 2. Thus, 


PJS =y] =q; for j7=...,—2,—-1,0,1, 
and 
1 
>) Gal. 
j=- 
Let 


CO 
=Soa_rs*, O< <1, 


and let yt* be as in Theorem 22. For0<s <1, 


CO 
Ee X Gee 
n=0 


where, for n = 1,2,..., the number r** denotes the probability that the smallest 
positive strict ascending ladder time equals n. 

We want to relate ytt to h. Form = 1,2,..., let r{7, denote the probability 
that n is the m*® smallest positive strict cane ladder time. Then, for 0 < 
s <l, 


++(s rit, s”, 
[pt *(s)]” -Dri 
valid even for m = 0 with the conventions To. 7 ae rvand aes = 0 forn >Q. 
If S;(w) = 1—™m for some m > 0, then in order that n > 1 be the first 
positive strict ascending ladder time, it is necessary and sufficient that n — 1 be 
the m*! positive strict ascending ladder time for the random walk with steps 


(S2 — S1), (S3 — S2),..., a random walk that is independent of S1. Therefore, 


CO 
oe = > Gemt ie ; n > 0 $ 


m=0 


Multiply by s” and sum to obtain 
= $ g "lett (e) = sh(y*t(s)). 


Thus, we confront the issue of whether, for each s € [0, 1], the equation z = sh(z) 
has a unique solution in [0,1], which would then necessarily be the value of 
ytt(s). We note that h(1) = h(1—) = 1. Thus, for s < 1, the function 
xz ~> sh(x) is concave up, continuous, nonnegative at 0, and less than 1 at 1. It 


510 25. RENEWAL SEQUENCES 


follows that the equation x = sh(z) has a unique solution for each s € [0,1), and 
that solution equals ptt (s). 

The equation z = h(x) is of interest. Of course, the solution z = 1 is the 
value of ytt (1), but in case of more than one solution, it is the smallest of the 
solutions that necessarily equals ptt (1—). So, h(x) = x has a unique solution, 
namely x = 1, if and only if the set of strict ascending ladder times is recurrent. 
It is easy to see that there is a unique solution if and only if h(0) > 0 and 
h'(1) < 1. Since h'(1) = 1 — E(S1), we conclude that the set of strict ascending 
ladder times is recurrent if E(S,;) > 0 and P[S, = 0] < 1, as was expected. In 
the case of transience, the probability that there is at least one positive ascending 
ladder time equals the smallest solution of x = h(x). 

It is worth noting that only when h(x) = x for all z € [0,1], a trivial case, 
does the equation h(x) = x have more than two solutions. 


Problem 37. For a random walk S as described in the preceding example with 
E(S1) < 0, calculate the distribution of sup{S,:n € Zt}. 


Problem 38. For a random walk S as described in Example 1 with E(S1) > 0, find 
the probability that there is at least one positive weak descending ladder time. 


* Problem 39. For each simple random walk in Z, find the waiting time distributions 
and potential measures for the following four regenerative sets: the random sets of 
strict ascending ladder times, of strict descending ladder times, of weak ascending 
ladder times, and of weak descending ladder times. 


Problem 40. Let q € (0,1) and c € [0,1 — q]. Follow the instructions of the 
preceding problem for the random walk whose step distribution is given by 


P[S; = —k] = cg, k=0,1,2,... 
PIS == HEE. 


For each q find those c for which the regenerative sets of all four types of ladder 
times are recurrent. 


CHAPTER 26 
Time-homogeneous Markov Sequences 


Markov sequences are so important that several chapters could be devoted to 
their study. We will mainly limit our coverage to two kinds of topics: (i) those 
that involve instructive applications of material presented earlier in this book 
and (ii) those that lay the groundwork for material on continuous-time Markov 
processes presented later in the book. 


26.1. Transition operators and discrete generators 


Most of this section is concerned with new terminology and notation. If X = 
(Xn: n = 0,1,2,...) is a Markov sequence of Y-valued random variables, then 
the space W is called the state space of X. Throughout this chapter, all state 
spaces will be Borel spaces. We identify the subscript n on X, with time, so 
that an event of the form [X, € B] might be read as “the state of the Markov 
sequence X at time n is a member of B” or “X, is in B at time n”. 

Recall from Chapter 22 that the distribution of a time-homogeneous Markov 
sequence (Xo, X1,-...) in a Borel space W is characterized by an initial distribu- 
tion Ro and a transition function R satisfying 


P|Xn41 € B | (Xo,---,Xn)] = R(Xn, B). 


In this chapter, we often write Qn for the distribution of Xn, n > 0, so that 
the initial distribution will usually be denoted by Qo rather than Rọ. Also, 
all Markov sequences treated in this chapter are time-homogeneous, so we will 
usually drop the adjective ‘time-homogeneous’. 

We now introduce some new objects that will take over the role of the tran- 
sition function R. 


Definition 1. Let Y be a Borel space on which a collection of distributions 
M = (Hz: x € W) is defined. Assume that for each measurable set B C W, the 
function £z ~> 44,(B) is measurable. The left transition operator T corresponding 


512 26. TIME-HOMOGENEOUS MARKOV SEQUENCES 


to the collection M of transition distributions operates on the space of bounded 
measurable R-valued functions on W and is given by 


(T(P) (2) = [ Kone Gay. 


The corresponding right transition operator, also denoted by T, operates on the 
space of probability measures on W and is given by 


((Q0)T)(B) = [ jig (B) Qo (de). 


To avoid a clutter of symbols, parentheses will usually be omitted from the 
notation introduced in the preceding definition. Thus, one might write (T f)(x) 
or T f(z), and similarly, (QoT)(B) or even QoT(B). 

We usually refer to a transition operator without the modifiers ‘left’ or ‘right’ 
since an explicit or implicit identification of domain serves to distinguish left 
from right. 

The connection between transition functions and transition operators is as 
follows. Given a transition function R, the transition distributions are given by 
uz = R(z,-),x € Y. The collection of transition distributions uniquely deter- 
mines the corresponding left and right transition operators T, so a transition 
function R uniquely determines a corresponding transition operator T. 

On the other hand, since y,(B) = TIg(x) for x € Y and measurable sets B, 
it is clear that a left transition operator T uniquely determines a correspond- 
ing collection of transition distributions, which in turn determines a transition 
function R. Similarly a right transition operator T determines the collection 
of transition distributions and the transition function since uz = 6,7. Thus, 
there is a one-to-one correspondence between transition operators (either left or 
right) and transition functions R. We are led to reformulate the definition of 
time-homogeneous Markov sequence in the following equivalent way. 


Definition 2. Let (Y,G) be a Borel space and T a transition operator for 
W, defined in terms of a collection (uz: x € WV) of probability measures on W. 
A random sequence X of V-valued random variables adapted to a filtration 
(Fn: n > 0) is a time-homogeneous Markov sequence with respect to that filtra- 
tion, having state space Y and transition operator T, if for all n > 0, wx, is the 
conditional distribution of Xn+1 given Fn. 


Given a transition operator T and initial distribution Qo, the existence of a 
corresponding Markov sequence (with respect to the minimal filtration) is guar- 
anteed by an easy application of Theorem 3 of Chapter 22, and the distribution 
of X is uniquely determined by T and Qo. However, there is no uniqueness in 
the converse direction (see Problem 7). 


26.1. TRANSITION OPERATORS AND DISCRETE GENERATORS 513 


Problem 1. Prove that the transition operator T takes bounded measurable func- 
tions to bounded measurable functions and probability measures to probability 
measures. 


Problem 2. If v is a probability measure on a Borel space (W,G) and f: ¥ > R 
is a bounded measurable function, let vf denote f f dv. With this notation, show 
that if T is a transition operator, then (QoT)f = Qo(Tf). 


Problem 3. Let T be an operator that takes the space of bounded measurable 
functions on a Borel space W to itself. Show that T is a left transition operator 
if and only if: (i) T is linear, (ii) Tf > 0 whenever f > 0, (iii) T1 = 1, where 1 
denotes the function that is identically equal to 1; (iv) limnoTfn(x) = 0 for every 
xz whenever (fn: n = 1,2,) is a decreasing sequence satisfying limnoo fn(z) = 0 
for every z. 


Problem 4. Let T be a measurable operator that takes the Borel space of probabil- 
ity measures on a Borel space Y to itself. Show that T is a right transition operator 
if and only if (br + (1 — b)ve)T = b(1T) + (1 — 6)(v2T) whenever 0 <b < 1. 


* Problem 5. Let (Xn: n > 0) be a Markov sequence with respect to a filtration 
(Fn: n > 0), with transition operator T. Let Qn denote the distribution of Xn, 
and f a bounded measurable function on the state space. Show that, for all n > 0, 


On = Qn41 and EC f oXn4 | Fa) =(Tf)o Xn a.s. 


Problem 6. Show that if T is a transition operator, then T* is a transition operator 
for all k > 1, where T* denotes the k-fold composition of T with itself. Then 
generalize Problem 5 to show that, for all n > 0, 


QnT* =Qniz and E(foXnse| Fn) =(T*f)o Xn as. 


Problem 7. For the state space Y = {0,1}, find two transition operators T and T 
such that the (deterministic) sequence X = (0,0,0,...) is a Markov sequence with 
transition operator T and also a Markov sequence with transition operator T. 


Technically speaking, a time-homogeneous Markov sequence consists of a ran- 
dom sequence, a state space, a filtration, a transition operator, and the distribu- 
tion of the first term of the sequence. The state space is often implicit, and an 
unmentioned filtration is usually the minimal filtration. The transition operator 
is usually mentioned explicitly, and if one speaks of a particular random se- 
quence in connection with two different transition operators, as in the preceding 
exercise, then one is speaking of two different Markov sequences. 

While both the transition operator T and the initial distribution Qo are needed 
to uniquely determine the distribution of a Markov sequence, we will see that 
many of the most important properties of a given sequence have more to do with 
properties of T than with properties of Qo. Thus it is common to simultaneously 
consider all of the Markov sequences corresponding to a given choice of T. In 


514 26. TIME-HOMOGENEOUS MARKOV SEQUENCES 


doing so, it is convenient to use a single underlying measurable space. Given a 
Borel space (Y,G), the corresponding canonical sample space is 


to which we assign the usual product o-field, denoted by F. Thus each sample 
point w € Q is a sequence (wọ, w1, ...) of members of Y. In this context, we will 
always use the symbol X to denote the random sequence (Xo, -X1,...), where 
for each n > 0, 


(26.1) Xn(w) = wn. 


Fix a transition operator T. It and an initial measure Qo, specified to be the 
distribution of Xo, determine a unique measure P% on (Q, F). The slightly 
abbreviated notation P7” is used for P=. When X is governed by P*, we say 
that x is the initial state of X. The expectation operators associated with P* 
and P®° are denoted by E” and E®° respectively. 

The transition operator T determines and is determined by the family of 
distributions (P*: x € W), called the Markov family, with transition operator T, 
and this Markov family determines the probability measures P®° via the formula 
in Problem 8. 

The following exercises are intended to give the reader some practice with the 
notation just introduced. 


Problem 8. Show that for a given transition operator 7’, the probability measures 
P*,x € Y, determine the probability measures P®° for arbitrary Qo by way of 


Pça) = | P*(A) Qolda) 
Q 
for all AE F. 
Problem 9. Show that 
E”(fo Xk) =T"f(£), k20, 
for all bounded measurable f: ¥ > R. 


Problem 10. Show that for a Markov sequence with transition operator T, the 
conditional distribution of the random sequence (Xx, Xk+1, Xk+2,... ) given Xp is 
P**. Generalize to allow k to be a stopping time. 


We conclude this section with one further definition. 


Definition 3. The discrete generator of a Markov sequence and of its tran- 
sition operator T is the operator G = T — I, where J is the identity operator. 


26.2. EXAMPLES 515 


The connection between T and G is so simple that it hardly seems worthwhile 
to define G. However, one of our purposes here is to prepare the reader for 
Markov processes in continuous time, for which generators are nontrivial and 
very useful. 


26.2. Examples 


When Markov sequences are studied in elementary courses, their state spaces are 
usually finite or countable. In such cases, transition operators can be represented 
by matrices. 


Example 1. [Finite state space] Let W be the finite set {1,2,...,m}. In this 
case, the R-valued functions defined on W can be identified with vectors in R”. 
By convention, we represent such a vector f by the column matrix with entries 
f(z), x =1,...,m. Then a transition operator T can be regarded as an m x m 
matrix (T (x,y): 1 < x,y < m) in which each entry is nonnegative and each row 
sum satisfies 


X T(z,y) =I Moray 1D is 


y=l 


The condition that the row sums equal 1 arises from the relation p,{y} = T (x,y), 
where pz, 1 < x < m, are the transition distributions. The matrix T operates 
on column matrices f by multiplication on the left, so that the notation T f used 
earlier is consistent with matrix multiplication. 

Similarly, matrix multiplication is useful when regarding T as a right tran- 
sition operator. We identify each probability measure Qo on WV with the row 
matrix whose entries are the quantities Qo({z}),2 = 1,2,...,m. It is easily 
checked that when this identification is made, the probability measure QoT is 
identified with the matrix QoT. 

Here is a specific example. Let n = 5 and 


a 

ll 
O O Ouin e 
O Oun o © 
Cwvly Oui © 
CO Oe © © 
ewe CO © © 


We let our Markov sequence start at 3. Then the sequence corresponds to the 
gambler’s ruin problem, in which the gambler starts with 3 dollars, wins a dollar 
each time with probability å, loses a dollar each time with probability 2, and 
quits as soon as the fortune reaches either 1 dollar or 5 dollars. The gambler’s 
fortune at time 2 is a random variable whose distribution is the row matrix 
(0,0, 1,0,0)T? = (4,0, 4,0, $). 


516 26. TIME-HOMOGENEOUS MARKOV SEQUENCES 


We may generalize the matrix representation in the preceding example to 
arbitrary countable state spaces. The following result uses the notation of Prob- 
lem 2. 


Proposition 4. If T is a transition operator for a countable state space WV, 
then T may be represented by a matriz, also denoted by T, with entries T(z, y) 
given by the formula 


T(z, y) = ôT fy = P*(X1 =y), ryew, 


where dz is the point mass at x and fy is the indicator function of the single- 
ton {y}. This representation has the property that for any distribution Qo and 
bounded function f on Y, the distribution QoT and the function Tf are repre- 
sented by the respective matrix products QoT and Tf, and for any nonnegative 
integer k, the operator T* is represented by the matriz product T*. 


For countable state spaces Y, any matrix T = (T(z,y): x,y € VY) with non- 
negative entries and with row sums equal to 1 is called a transition matriz. On 
any countable state space, there is a one-to-one correspondence between the set 
of transition matrices and the set of transition operators. 


Problem 11. Prove Proposition 4. 


Problem 12. Make and interpret some calculations for the transition matrix 


10 0 0 0 

2 0 4 0 0 

0 2 0 4 0 

00 20 3 
T=!0 0 0 2 0 

000 0 2 

000 0 0 


similar to those made in Example 1. 


Problem 13. Let X be a random walk on Z. Show that X is a Markov sequence, 
and determine the corresponding transition matrix and discrete generator in terms 
of the step distribution of the random walk. 


Example 2. [Birth-death sequences] A birth-death sequence is a Markov se- 
quence (Xn: n > 0) with state space Zt whose transition probability measures 
[lz satisfy 

Ue({z —1,2,4+1})=1 forz>0 
and po({0,1}) = 1. In terms of the corresponding transition matrix T, the 
condition is that T (x,y) = 0 if |r — y| > 1. It follows from this condition that 
Xn —Xn—i| < 1 a.s. for all n > 0. If Xn = Xn_1 +1, we say that a birth occurs 


26.2. EXAMPLES 517 


at time n. If X, = Xn_1 — 1, we say a death occurs at time n. Note that unless 
all of the diagonal entries of T are 0, it is not necessarily true that all transitions 
are births or deaths. 

Birth-death sequences can be regarded as simple models of changing popu- 
lation sizes. In such a model, the population size can only increase or decrease 
by at most 1 during each time step. When thinking about such a model, it is 
sometimes best not to think of the index n in Xn as marking the passage of time 
intervals of uniform length, but rather as indicating the times at which changes 
in the population might occur. 


Example 3. [Branching processes] Let u be a probability distribution on Z*. 
A branching process with branching distribution u is a Markov sequence on Zt 
having transition probability measures 


Hr = Tia - 
The branching distribution is also called the offspring distribution. 

Like birth-death sequences, branching processes can be interpreted as popula- 
tion models. When the branching process is in state z, we think of the population 
as having x members, each of which independently goes through a reproductive 
cycle. During each cycle the member dies and produces a random number of 
offspring. The distribution u gives the probabilities of the various outcomes: 
u{k} is the probability that there are k offspring from a particular individual. 
Thus, the convolution p** is the distribution of the upcoming population, given 
that the population is currently z. 


Problem 14. Suppose that {0} = p{2} = Ł in the preceding example. Calculate 
2 and ps. 


Example 4. Let u be a distribution on Z \ {0}. For x € Z" \ {0}, set 


u iet 
Hz = $ ôzr—1 if1<z<œ 


Ooo if. = oo. 


Let (Xn: n = 0,1,...) be a Markov sequence with initial state 1 and transi- 
tion distributions 4z, x € Zz \ {0}. Set Yn = I[x,=1). It is rather obvious 
that (Yn: n > 0) is a renewal sequence with waiting time distribution u. The 
correspondence thus obtained between renewal sequences and certain Markov se- 
quences is illustrated in Figure 26.1. If the initial distribution 6, for the Markov 
sequence is replaced by an arbitrary distribution Qo, the sequence (Yn) is a 
delayed renewal sequence. A special Qo is given in the next problem. 


518 26. TIME-HOMOGENEOUS MARKOV SEQUENCES 


FIGURE 26.1. Correspondence between a renewal sequence and 
a certain Markov sequence 


Problem 15. Suppose that the waiting time distribution p in the preceding exam- 
ple has finite mean. Let (Xn: n > 0) be the corresponding Markov sequence with 
initial distribution Qo defined by 


(26.2) Qo{n} = eA. 

jai HII, 00) 
Show that distribution of each Xn is Qo. Also show that the corresponding delayed 
renewal sequence (Yn) is ‘stationary’, in the sense that it has the same distribu- 
tion as the sequence (Zn) defined by Zn = Yn4i1,n = 0,1,2,.... Compare with 
Theorem 15 of Chapter 25, resolving any apparent differences between formulas. 


Problem 16. Let py be as in Example 4, and define transition distributions pz for 
a Markov sequence (Xn: n > 0) on Zt \ {0} by 


p1} = EE = 1 ee +1}. 


Show that if the initial state is 1, then the random sequence (Yn) defined by Yn = 
Ix, =1) 18 a renewal sequence with waiting time distribution yz. See Figure 26.2 for 
an illustration of the correspondence identified in this problem. 


Example 5. [Network walks] Let G be a graph, meaning that G consists 
of a set of points called ‘vertices’ and a collection of ‘edges’ connecting certain 
pairs of vertices. Two vertices are neighbors if they are connected by an edge. 
We assume that every vertex in G has at least one neighbor, and also that G is 
locally finite, meaning that each vertex has only finite many neighbors. Let Y be 


26.2. EXAMPLES 519 


6 

5 

4 

3 

2 

| 
0 2 4 6 8 10 12 14 16 18 
1 0 0 1 0 0 1 1 O t 0 0 O0 O 0 DO 


FIGURE 26.2. Correspondence between a renewal sequence and 
a second Markov sequence 


the vertex set of G, and for each x € W, let n} denote the number of neighbors 
of x. Define a transition matrix T by 


+ if x and y are neighbors 
T (x,y) = = 


0 otherwise. 


Here is an intuitive description of a Markov sequence X with transition op- 
erator T. Given that X is in state z at time n, then at time n + 1, it will be at 
one of the neighbors of z, each with equal probability. Sometimes X is called a 
‘random walk on G’, although according to the definition of random walk given 
in Chapter 11, X is not a random walk unless G is very special. 


Example 6. [Probabilistic cellular automata] Let © = {0,1}4%. For this more 
complicated state space, it will be convenient for us to change our usual notation. 
We will write y for a typical state, and use the letter z for a typical member of 
Z. Each state wy is a sequence of 0’s and 1’s indexed by the integers. We write 
w(z) for the element in the sequence wy that is indexed by the integer z. A state 
w is called a configuration in this context. We think of w as the ‘global’ state of 
a one-dimensional array of 2-state components or cells. We call z the site of the 
cell located at z, and ~(z) is the ‘local state’ at the site z. 

For each integer z, let fz: Y — R be defined by 


We will start to define an operator T by saying how T acts on the functions fz. 
For each 7 € {0,1} x {0,1}, let S, be a transition operator on the state space 


520 26. TIME-HOMOGENEOUS MARKOV SEQUENCES 


{0,1}, and let g be the identity function on {0,1}. Define 


T f2(W) = Sip(z-1),v(z41)) 9(Y(2)) - 
We extend T to products of functions f, by defining 


(26.3) r(T] f) a eee 


zEA zcA 


for all finite sets A C Z. Now extend T to finite linear combinations of such 
products by linearity. We leave it as an exercise for the reader to extend T to 
bounded measurable functions by taking limits and then showing that T is, in 
fact, a transition operator. 

We can describe Markov sequences X with transition operator T as follows. 
Suppose the sequence is in state y at time n. Now consider a particular site z. 
The local state at the cell at z will make a transition according to the transition 
matrix S,, where 7 = (y(z — 1), y(z + 1)). Thus, the local transition probability 
at z depends on the states of the cells at the sites z— 1 and z+1. We say that the 
cell at z updates according to S,,, and, as a consequence of (26.3), conditioned on 
the state w, all of the cells update independently. We have described the cellular 
automaton with local update rules {S,,7 € {0,1} x {0,1}}. Continuous-time 
versions of cellular automata form the subject matter of Chapter 32. 


Problem 17. Fill in the details for the construction of the transition operator T in 
the preceding example. 


26.3. Martingales and the strong Markov property 
There is an important connection between Markov sequences and martingales. 


Definition 5. Let G be an operator with domain and target equal to the 
set of bounded measurable functions on some Borel space Y, and let (Fn: n > 
0) be a filtration in a probability space (Q, F, P). A sequence X of W-valued 
random variables on (Q, F, P) is a solution to the martingale problem for G and 
(Fn: n > 0) if for all bounded measurable functions f: Y — R, the sequence Y 
defined by 


n—1 
Yn = fo Xn- X (Gf) oX: 
k=0 


is a martingale with respect to the filtration (Fn: n > 0). 


Theorem 6. Let G, Y, z, (Fn: n > 0), and (Q, F, P) be as in the preceding 
definition. Then a sequence X of V-valued random variables is a Markov se- 
quence with respect to (Fn: n > 0) and having discrete generator G if and only 
if X is a solution to the martingale problem for G and (Fn: n > 0). 


26.3. MARTINGALES AND THE STRONG MARKOV PROPERTY 521 


Problem 18. Prove the preceding theorem. 


* Problem 19. Suppose that X = (Xn: n > 0) is a [0,1]-valued random sequence 
that is both a time-homogeneous Markov sequence and a submartingale with re- 
spect to some filtration (Fn: n > 0). Obtain the Doob decomposition of X as a 
corollary of Theorem 6. 


Theorem 6 is useful in two ways. First, it gives us a large collection of martin- 
gales that can be used to analyze a given Markov sequence. Second, it provides 
us with a way to recognize that a random sequence is Markov. This second use is 
particularly valuable in the continuous-time version of this theory, but is usually 
ignored in the case of Markov sequences. Nevertheless, we will show how it can 
be applied by using it to prove the strong Markov property. 


Theorem 7. [Strong Markov property] Let X be a Markov sequence with 
respect to a filtration (Fn: n > 0), with transition operator T. Let r be an 
almost surely finite stopping time relative to the filtration {Fn}. Then the random 
sequence Y defined by 


Yn =Xrin, n20, 


is a Markov sequence with transition operator T that is adapted to the filtration 
eee n > 0}. 


PROOF. This result follows immediately from Theorem 6 and the Optional 
Sampling Theorem. O 


Problem 20. Prove the strong Markov property without using martingales. 


Let G be a discrete generator on a state space Y and let A be a Borel subset 
of Y. A bounded measurable function f: Y > R is G-subharmonic on A if 
Gf(x) > 0 for all x € A. It is G-superharmonic on A if —f is G-subharmonic 
on A. It is G-harmonic on A if it is both G-subharmonic and G-superharmonic 
on A. If A = W, then the phrase ‘on A’ is often dropped. The modifier ‘G-’ is 
also often omitted when only one discrete generator is being considered. 


Problem 21. Let G be the discrete generator of a Markov sequence X. Let A be 
a Borel set and 7 = inf{n: Xn € A‘}. If f is a bounded measurable function on 
the state space that is G-subharmonic on A, show that the random sequence Y 
defined by 


Y= Jo Xnar 


is a submartingale with respect to the same filtration to which X is adapted. 


522 26. TIME-HOMOGENEOUS MARKOV SEQUENCES 


Problem 22. Let X be a simple random walk with step distribution péd,,} + (1 — 
p)o;—1}. Find the discrete generator G of X and show that the function 


f(z) = (=) 
P 


is G-harmonic. (Refer back to Chapter 24 and our treatment of the gambler’s ruin 
problem for an application of this fact.) 


26.4. Hitting times and return times 


Let T be a transition operator with state space Y and discrete generator G. For 
each Borel set B C W, set 


Tz,B = P*|X, E B for some n > 0], 
called the hitting probability of B starting at x, and 
T} p = P*[Xn € B for some n > 0]. 


Note that m4, B = 1 for x € B and mz,B = ee for x € BS. For <z € B, 1 p is 
called the return probability to B starting at x. We often write mz, and Te for 
Tz, B and TB when B = {y}. 


Problem 23. Prove that 
T (, B) = Tr(:, B) 


for any transition operator and every Borel subset B of the state space. 


Problem 24. Let G denote the discrete generator of some Markov sequence X, and 


B a measurable subset of its state space Y. Let f(x) = mz,8 and g(r) = Tig for 


xz € WY. Prove that f and g are G-superharmonic on Y and that f is G-harmonic 
on B°. Also show that limn+o f ° Xn and limno go Xn exist almost surely. 


The preceding problem constitutes part of the proof of the following theorem, 
which characterizes the function z ~ Tg,B. 


Theorem 8. Let G be the discrete generator of a Markov sequence, and B 

a measurable subset of its state space. Then x ~> Tz B is the minimal bounded 
function h having the following properties: 

(i) h(z)=1 forxeB, 

(ti) h is G-harmonic on BS, 

(iti) h(x) >0 for all z, 
where the word ‘minimal’ is used in the sense that if h is any other function 
satisfying properties (i)-(iii), then ntz B < h(x) for all x. 


26.4. HITTING TIMES AND RETURN TIMES 523 


PRooF. That the function z ~> Tg g has properties (i) and (iii) is obvious. 
Property (ii) is contained in Problem 24. 

To complete the proof, suppose that h satisfies (i), (ii), and (iii). Set r = 
inf{n: Xn € B} and define Y = (Y1, Y2,...) by Yn = hoXna-. By Problem 21, 
Y is a bounded martingale. Let Y> denote its limit. Clearly, Y,.(w) > 0 for all 
w and equals 1 if r(w) < co. Therefore, 


T2,B < EY = E” (Yo) = E” (h o Xo) = h(x) . O 


Example 7. For the state space Rt \ {0} consider a Markov sequence with 
transition distributions ur = tô, aF 5 O4z, where ô, denotes the delta distribu- 
tion at y. We set B = (0,1] with the goal of calculating the function z ~ a, p. 

In order to apply Theorem 8 we need the following formula: 


Gh(x) = $h(x/2) — h(x) + $h(42). 


A change of variables is helpful: set g(y) = h(2”), y € R. The requirement (ii) 
of Theorem 8 becomes 


(26.4) soy -1)—g(y)+59(yt+2)=0, y>O. 


Let us begin by restricting attention to y € Zt \ {0}. The equation (26.4) 
is what is known as a third-order homogeneous linear difference equation. We 
make the educated guess that there is a solution of the form k¥, and obtain the 
following condition on k: 

t—k+hk? =0. 
Solving for the three values of k and putting the solutions together just as one 
does for differential equations, we conclude that every solution of (26.4) is of the 
form 
a( = +b +e =E, ye Zt. 

By (iii) of Theorem 8, applied for large odd y if c > 0 and large even y if c < 0, 
we conclude that c = 0. By (i) of that theorem, we obtain a +b = 1. Finally, to 
get a minimal solution without violating condition (iii) we take b = 0. Thus, 


g(y) = (S49, yet. 


A little thought shows that g(y) = g([y]) for all y > 0 and g(y) = 1 for y < 0. 
Therefore, 


1 if0<ar<1 
Te B= , 
° (=ltvbym if om-l ee S27, m=1,2,.... 


Problem 25. Let Y = {0,1,2,...,g9} for some positive integer g. Fix p € (0,1) and 
for x € W, let wz be the uniform distribution on Y if z = 0, the delta distribution 
ôg if x = g, and pdz41 + (1 — p)dz-1 for other x. Calculate the functions £ ~> mz 
and z ~ one 


26. TIME-HOMOGENEOUS MARKOV SEQUENCES 


Problem 26. For all simple random walks in Z, calculate the functions £ ~> 7z0 
and z ~ ane 


Problem 27. For a Markov sequence on Z* \ {0} with transition operator T given 
by 


TH (x) = fle +) + 5A) 


calculate the function z ~> mgri. Discuss the relation between this problem and 
Problem 16. 


Problem 28. For a Markov sequence on Z* \ {0} with transition operator T given 
by 
(x — 1)(x + 2) 
T a aa eee 

f(z) r(x +1) r(x + pF) 
calculate the function £ ~> am,1. Discuss the relation between this problem and 
Problem 16. 


f(a+1)+ 


Problem 29. For a Markov sequence on Z with transition operator T given by 
T f(z) = 5 f(z —1) + 5f(z +2) 
calculate the function £ ~> 7z0. 


Problem 30. For a Markov sequence on Z* with transition operator T given by 


f(z —1)+if(£+2) ife>0 
Tf(z)= 
f(e) ifx=0 


calculate the function £ ~ Tz {0,3}. 


Problem 31. For the birth-death sequence with transition distributions pz given 


by fe{e+1} = z% and for z > 0, pz{x—1} = zy calculate the function £ ~> fzo- 


Theorem 9. Let p denote the generating function of the offspring distribution 
of a branching process. Then for this process, nro = c*, where c is the smallest 
root in [0,1] of the equation p(s) = s. 


PROOF. A branching process with initial state x > 1 is constructed from z 


independent processes with initial state 1. Once any of these independent process 
hits 0 it stays at 0. Hence, the branching process with initial state x hits 0 if and 
only if all z of the independent processes hit 0. We conclude that pro = (pio)”. 


Using the fact that x ~> (pio)* is harmonic we obtain 


X (pio)” u**{y} — (pio)” =0. 


This condition simplifies to 


[e(pio)] = (Pio), 


26.5. RENEWAL THEORY AND MARKOV SEQUENCES 525 


from which it follows that pio is a solution of the equation p(s) = s. Moreover, 
for any solution c of this equation, the function z ~ c” is harmonic. The desired 
conclusion follows from the minimality assertion in Theorem 8. D 


Problem 32. Let p € (0,1). For a branching process with branching distribution 
x ~> pd2 + (1 — p)do, calculate the function £ ~ nzo. 


Problem 33. Let q € (0,1) and {z} = (1 —q)q*, x € Z+. Calculate £ ~> nzo for 
a branching process with branching distribution p. 


26.5. Renewal theory and Markov sequences 


We have already seen some connections between Markov sequences and renewal 
sequences in Example 4 and Problem 16. In this section we exploit another such 
connection. Although some of the results apply generally, we will restrict our 
attention to Markov sequences with countable state spaces. In this context, a 
transition operator T is represented by a transition matrix T = (T(z,y): x,y € 
W), as described in Proposition 4. We will be interested in the asymptotic 
behavior of T* as k — oo. Denote the entries of the matrix TE by T*(z, y) 
for nonnegative integers k. It follows from Proposition 4 and Problem 6 that 
T*(z,y) = P*[X, = y], so the results below will give information about the 
asymptotic behavior of X, as k —> oo. 

Let X = (Xn: n > 0) be a time-homogeneous Markov sequence with state 
space W, let y be a fixed state in Y, and define Y = (Yn: n > 0) by the formula 


Yn = Igy} ° Xn- 


Thus, Y is a random sequence of 1’s and 0’s, with the 1’s indicating the times 
when the Markov sequence X visits the state y. It is a straightforward conse- 
quence of the strong Markov property that Y is a delayed renewal sequence. 

Given a transition operator T for a countable state space Y, we have defined a 
delayed renewal sequence Y on the probability space (Q, F, P”) for each z € Y. 
When x = y, this delayed renewal sequence is a renewal sequence, so there 
is a renewal sequence corresponding to each state. We use the classification 
of renewal sequences in Chapter 25 to classify the states in UW. A state y is 
transient, null recurrent, or positive recurrent according to the classification of 
its corresponding renewal sequence. The period y, of this renewal sequence is 
called the period of the state y, and y is aperiodic if y, = 1. We denote the mean 
waiting time of the renewal sequence corresponding to y by m,. In the context 
of Markov sequences, m, is called the mean return time of y. 

We will use the Renewal Theorem to analyze the behavior of the sequence 
(T*(z,y): k > 0) for fixed states x,y. In terms of the random sequence Y, 
T*(2,y) = P*(Y¥, = 1], so (T*(y,y): k > 0) is the potential sequence for the 


526 26. TIME-HOMOGENEOUS MARKOV SEQUENCES 


renewal sequence corresponding to y. It follows immediately from the Renewal 
Theorem that 


provided that yy < oo. By the definition of ‘period’, T*(y,y) = 0 if k is not an 
integer multiple of yy, so 


1 
i 1 
26.5 lim ( — per = — 
(26.5) Jim (= 2 (y,y)) a 


whether or not yy < oo. We have almost proved the following result. 


Theorem 10. [Renewal for Markov sequences] Let T be a transition operator 
for a countable state space Y. Then for all states x,y E€ WV, 


. ie k+i Try 
(26.6) im (= = T (z,y)) a 


PROOF. Since wy, = 1, the case x = y follows from (26.5). The case yy = œ 
is trivial, since at most 1 visit to the state is possible and m, = oo for that case. 
It remains to consider general x and y with yy < œ. Let r =inf{n > 0: Xn = 
y}. By the strong Markov property, 
k+i 
POR pay ` PX. = y and T =n] 
n=0 
k+i 
=y PPR Sa aa 


n=0 
=) Tenan T Wy aa: 
n=0 
We substitute this last expression into (26.6) and then use the Dominated Con- 


vergence Theorem for sums to justify bringing the limit inside the infinite sum. 
Since 


OO 
Tay = S Pr = 
n=0 


the desired result now follows from the case for which z = y. O 


Problem 34. Prove that every time-homogeneous Markov sequence with finite state 
space has at least one recurrent state. 


Problem 35. Use the Strong Law for Renewal Sequences to determine the asymp- 
totic behavior of the number of visits to a state y made by a time-homogeneous 
Markov sequence with initial state z. 


26.6. IRREDUCIBLE MARKOV SEQUENCES 527 


26.6. Irreducible Markov sequences 


We introduce some terminology that will be useful in further analyzing the 
asymptotic behavior of Markov sequences. Although we do not assume in this 
section that the state spaces are countable, the terminology is generally useful 
only in the countable case. 

If try > 0, then we say that the state y is accessible from the state z. We 
call a transition operator and corresponding Markov sequence irreducible if all 
states are accessible from one another. 

It is an immediate consequence of the forthcoming Problem 39 that when the 
transition operator is irreducible, all states have the same period and are of the 
same type as far as recurrence is concerned. Thus one can speak of the period 
of an irreducible transition operator, identify it as aperiodic or not, and classify 
it as positive recurrent, null recurrent, or transient. Of course we can apply the 
same terminology to the irreducible Markov sequence itself. 


Problem 36. Prove that a necessary and sufficient condition for a state x to be 
transient is that there exists a state y Æ x that is accessible from z and for which 
Tye < 1. 


Problem 37. Using the necessary and sufficient conditions for transience given in 
the preceding problem, write a statement that gives necessary and sufficient con- 
ditions for a state to be recurrent. 


Problem 38. Prove that if x and y are recurrent states, then either Try = myr = 1 
OF Try = Tys = 0. Also, prove that me: = 0 if x is recurrent and z is transient. 


Problem 39. Show that if two states are each accessible from one another, then 
they both have the same period and are of the same type—positive recurrent, null 
recurrent, or transient. 


Problem 40. Call recurrent states x and y equivalent if mz, = 1. Prove that this 
relationship is, in fact, an equivalence relation among the recurrent states. 


The equivalence classes induced by the equivalence relation of the preceding 
problem are called irreducible recurrence classes. A Markov sequence is some- 
times analyzed according to the following scheme: (i) The irreducible recurrence 
classes are identified; (ii) for each transient state the probability is calculated— 
using Theorem 8 for instance—of hitting each irreducible recurrence class and 
also of staying forever in the transient states; (ili) a study is made of each of the 
irreducible Markov sequences obtained by considering the irreducible recurrence 
classes to be separate state spaces, a legitimate procedure in view of Problem 38. 


Problem 41. Show that if y and z are in the same irreducible recurrence class, 
then fry = 7zz for all states x. 


528 


26. TIME-HOMOGENEOUS MARKOV SEQUENCES 


Problem 42. For each of the Markov sequences of Example 4 there is a unique 
subset Y of the state space such that the Markov sequence restricted to Y is irre- 
ducible. Describe Y in terms of the distribution p of that example. Also, for each 
y € Zt \ {0} find three distributions p for which {Y = oo and the corresponding 
transition operator has period y: one for which the transition operator is positive 
recurrent, one for which it is null recurrent, and one for which it is transient. 


Problem 43. Show that if T is a transition operator with finite state space, then 
T is irreducible and aperiodic if and only if for some positive integer k, all entries 
of the matrix T” are positive. 


Problem 44. Show that if T is a transition operator with countable state space, 
then T is irreducible and aperiodic if and only if for every pair of states x, y, there 
exists a positive integer k such that T” (x,y) > 0 for all n > k. 


Problem 45. Let T be an irreducible transition operator with countable state space 
and suppose that T(z,x) > 0 for some xz. Prove that T is aperiodic. Also show 
that it is possible for T to be aperiodic even if T(z, x) = 0 for all x. 


Problem 46. Let T be an irreducible recurrent transition operator with countable 
state space and period k. Prove that T* is an aperiodic transition operator with 
no transient states and k irreducible recurrence classes. 


26.7. Equilibrium distributions 


Let T be a transition operator with state space WY. If Qo is a probability dis- 
tribution on W, then we call Qo an equilibrium distribution for T if QoT = Qo. 
The following fact is easily proved: 


Definition and Proposition 11. If Qo is an equilibrium for T and if X is 
a Markov sequence with transition operator T and initial distribution Qo, then 
Qo is the distribution of X+ for all k > 0, and X is called a stationary Markov 
sequence. 


Problem 47. Let Qo be an equilibrium distribution for a transition operator T. 
Show that if x is a transient or null recurrent state, then Qo{x} = 0. 


Problem 48. Let T be a transition operator with state space Y. Assume that Y is 
a compact Polish space. (Polish spaces are introduced and treated in Chapter 18.) 
Show that there exists at least one equilibrium distribution for T. Hint: Let X be 
any Markov sequence with transition operator T, and let Qn be the distribution 
of Xn, n > 0. For n > 0, define probability distributions Rn by 


Rn(A) = == > Qa(4). 
k=0 


Show that every limit point of the sequence (Rn: n > 0) is an equilibrium distri- 
bution for T. 


26.7. EQUILIBRIUM DISTRIBUTIONS 529 


We now focus on equilibrium distributions in the setting of countable state 
spaces. There is a strong connection between these distributions and the behavior 
of corresponding Markov sequences (Xo, Xj,...) for large time. The distribution 
of Xn is represented by the matrix QoT”, where Qo is the row matrix that 
represents the distribution of Xo and T is the transition matrix. Thus we will be 
interested in the behavior of the powers of T. The Renewal Theorem for Markov 
sequences will be our main tool. An immediate consequence of that theorem is 


1 zy de 
(26.7) lim — S°T*(2,y) = -zy df A(x, y) 


k=1 Y 


for all z,y € Y. 


Proposition 12. For the matriz A given in (26.7), all rows corresponding 
to states in the same irreducible recurrence class are identical, and all entries in 
columns corresponding to null recurrent and transient states equal 0. Moreover, 


e if x is positive recurrent, then A(x,y) = 1/m, for y in the irreducible 
recurrence class of x and A(x, y) =0 for other y; 

e if x is null recurrent, then A(z, y) = 0 for all y; 

e if x is transient, then row zx is a linear combination of rows corresponding 
to various irreducible recurrence classes, where the weight corresponding 
to a given recurrence class is Tz, for any choice of y from that class. 


Problem 49. Prove the preceding proposition. Hint: Use Problem 38, Problem 39, 
and Problem 41. 


Proposition 13. For T a transition matriz, let the matrix A be defined by 
(26.7). Then TA = AT = A* = A. Moreover, a probability distribution Qo is 
an equilibrium distribution for T if and only if Qo A = Qo. 


PROOF. By treating each row of T as a probability distribution, we can apply 
the Bounded Convergence Theorem to obtain 


1 n 1 n+1 1 n+1 
= ie = k+1 — }; Taj ps 


By the Fatou Lemma, every entry of the matrix AT is less than or equal to 
the corresponding entry of the matrix A and the sum of the entries in any row of 
A is finite (in fact, less than or equal to 1). We finish the argument that AT = A 
by the following calculation showing that the sum of the entries in any row of 
AT equals the sum of the entries in the corresponding row of A: 


> > Ale.) Ty, 2) = 35S Ay) TY, 2) = X Alay). 
z y y z y 


530 26. TIME-HOMOGENEOUS MARKOV SEQUENCES 


From the preceding paragraph, we obtain A = AT = AT? =... and thus 
a 
A= A(= 2 Pi 


By treating each row of A as a finite measure, we can apply the Bounded Con- 
vergence Theorem to obtain A = A?. 
Suppose that Qo is an equilibrium distribution. Then, viewing T as a right 
transition operator, 
Oy = Col SQT = aay 


1 n 
=Qo(= Sor"). 
Qo = Qo(= x 
By the Bounded Convergence Theorem, QoA = Qo. 
Conversely, suppose that Qo = QoA. Operate on the right by T to obtain 


QoT = (QoA)T = QAT) = Q04 = Qo. O 


Proposition 12 shows that, in some sense, the interesting rows of A are those 
that correspond to positive recurrent states since all other rows can be written 
as (possibly trivial) linear combinations of those rows. The following theorem 
connects this fact to the equilibrium distributions for T. 


SO 


Theorem 14. For a transition matrix T, let the matrix A be defined by 
(26.7). The equilibrium distributions of T are the countable convex combina- 
tions of the rows of A that correspond to positive recurrent states. The row 
corresponding to any particular positive recurrent state ts the unique equilibrium 
distribution supported by the irreducible recurrence class containing that state. 


PARTIAL PROOF. Let x be a positive recurrent state. Since A = A?, the 
entry in position (x,x) of the matrix A equals the corresponding entry of A’. 
Because of Proposition 12 this equality can be written as 


(26.8) — =>) A(z, y) A(y, 2). 
a y 


By Proposition 12, the first factor in any of the summands is nonzero only if y 
is in the same irreducible recurrence class as x in which case the second factor 
equals 1/m,. Hence (26.8) becomes 


1 1 
Sey A 
ma mz È (x,y) 


Multiply both sides by the finite number m, to obtain >> , A(z, y= 1. 
We leave the rest of proof to the reader. O 


Problem 50. Use Proposition 12 and Proposition 13 to finish the proof of Theo- 
rem 14. 


26.7. EQUILIBRIUM DISTRIBUTIONS 931 


The following is a corollary of the preceding theorem and the Renewal Theo- 
rem for Markov sequences. 


Corollary 15. Let T be an irreducible transition matriz on a countable state 
space Y. Then there exists an equilibrium distribution for T if and only if T 
is positive recurrent in which case there is a unique equilibrium distribution Qo 
given by Qo{z} = 1/mz. If in addition T is aperiodic, then for any Markov 
sequence X with transition matriz T, 

1 
lim P[X, = z] = —. 


nN co Myr 


Example 8. Let T be a transition matrix with countable state space Y. A 
state x € W is called absorbing if T(x, x) = 1. Suppose that all of the states in Y 
are either absorbing or transient, and denote the number of absorbing states by 
a and the number of transient states by t. (By Problem 34, a > 0 if W is finite.) 
Label the absorbing states (if any) by the integers 1,...,a@ and the transient 


states by a + 1,...,a +t. Then 
I, O 
t=(5 o) 


where for any positive integer l, I; is the l x l identity matrix, 0 is a zero matrix, 
and B and C are, respectively, t x a and t x t matrices. Since each transient 
state can only be visited finitely many times with probability 1, it is easy to see 
that the entries of C* converge to 0 as k => oo. Elementary matrix calculations 
then give that 


I, 0 
é k — a 
pe P 3 i 


where M is the matrix (I; — C)~!B. In making this calculation, we have used 
the formula (; —C)~! = ,+C+C?+.... The entries of the tx a matrix M are 
the absorption probabilities. Thus, M(i,7) is the probability that the sequence 
with transition matrix T and transient initial state 7 eventually hits (and gets 
stuck at) the absorbing state j. Also, 


a 


1- $ M(i, j) 
j=1 


equals the probability that the sequence starting in the transient state i stays 
among the transient states forever. Of course, this probability equals 0 if t < œœ. 


532 26. TIME-HOMOGENEOUS MARKOV SEQUENCES 


Problem 51. Calculate the absorption probabilities for the transition matrix 


100 0 0 
0 10 0 0 
po do 
i 2 9 0 0 
00 4 3 4 


* Problem 52. Calculate the absorption probabilities for the transition matrix 


1 0 0 0 0 0 0 
0 1 0 0 0 0 0 
20 
szam 0 0 Saa O0 0 0 
1 3-2 -2 
0 3-21-] 0 0 3-21] : 0 
1 3:2°—2 
3-22] 0 0 0 0 3-221 0 
bew G -oe p aa 


starting at the state represented by the third row and column of the matrix. Also, 
find the probability of staying among the transient states forever starting at that 
state. 


Problem 53. Explain how the ideas in Example 8 can be used to help calculate 
the rows of the matrix A in Theorem 14 that correspond to transient states. Hint: 
Think of each irreducible recurrence class as a kind of absorbing state. 


Problem 54. Consider a birth-death sequence for which the transition distributions 
Hz satisfy uz{x — 1} > 0 for x #0. Prove that there is an equilibrium distribution 


if and only if 
def > /T {z} 
Hz—1 ) 
r= R) o, 
ale 
r=0 2z=1 
in which case there is a unique equilibrium distribution Qo given by 


Qo{z} = - (I Hee). 


* Problem 55. [Ehrenfest urn sequence] In this model, there are two urns containing 
a total of b balls. The state of the system is the number of balls in the first urn, 
so the state space is {0,...,b}. At each time unit, a ball is chosen at random 
from among all of the balls, and then the chosen ball is moved from the urn that 
contains it to the other urn. Thus, the entries in the transition matrix are given 
by 


Cease. 0<a2r<b, 
T(2,2+1)=->*, O<2r<b, 


Find the unique equilibrium distribution for T. Hint: Use the preceding problem. 


CHAPTER 27 
Exchangeable Sequences 


Recall from Definition 4 of Chapter 22 that a finite or infinite sequence is ex- 
changeable if its distribution is invariant under permutations of its terms. Our 
main goal in this chapter is to develop some simple ways to describe all ex- 
changeable sequences. We will see that an infinite sequence is exchangeable if 
and only if it is conditionally iid, and that finite exchangeable sequences can be 
described in terms of a certain urn model. The characterization results, known 
as De Finetti Theorems, were foreshadowed in Example 2 and Problem 9 of 
Chapter 22. Problem 10 of Chapter 22 provides an example of a finite exchange- 
able sequence that is not conditionally iid, thereby showing that the finite case 
cannot be derived from the infinite case. We focus on exchangeable sequences 
whose terms take values in a finite set, and describe how some of the formulas 
obtained for such sequences can be used to obtain information about unknown 
parameters. Later we generalize to sequences whose terms take values in a Borel 
space. In the last section, we introduce some important distributions that are 
naturally connected with infinite exchangeable sequences and an urn model. 


27.1. Finite exchangeable sequences 


The following result describes how to form a finite exchangeable sequence from 
an arbitrary finite sequence of random variables. The construction is rather 
natural in settings in which there is no intrinsic order among the indices of the 
random variables. 


Proposition 1. Letn be a positive integer and let (Z,,..., Zn) be a sequence 
of random variables defined on a common probability space and taking values in 
a common Borel space. Let II be a random permutation of the set {1,2,...,n} 
with each permutation having probability 1/n!. Furthermore, suppose that II is 
independent of the sequence (Z1, Z2,...,Zn). Then 


(27.1) (Zn), Zn- oa ATG) 


is a finite exchangeable sequence. 


534 27. EXCHANGEABLE SEQUENCES 


PROOF. Let p be a fixed permutation of {1,2,...,n}. Then the distribution 
of poll is uniform over the set of permutations; that is, the distribution of po H 
is the same as the distribution of II. Also, each of II and poll is independent of 
(Z,,...,Zn). Therefore the distribution of the vector (Z,(1(1)),---, Zp(m(n))) ÍS 
the same as the distribution of the vector (27.1). O 


For the next two sections and the remainder of this section we consider ex- 
changeable sequences having values in a finite set, and after that we return to 
general Borel spaces. The names of the members of the finite set are fundamen- 
tally irrelevant, but it will usually be notationally useful to call them 1,2,...,d 
for some d € Z* \ {0}. 


Definition 2. Let X = (X,,...,X,) be a finite exchangeable sequence tak- 


ing values in {1,2,...,d}. The empirical density of X is the random vector 
1 n 


where e;, 1 < i < d, are the standard basis vectors in Rt. The De Finetti measure 
of X is the distribution of Y. 


Let Yt denote the it coordinate of Y. The word ‘density’ is justified in 
the preceding definition by the observation that the function i ~» Y* is the 
density with respect to counting measure on {1,...,d} of the random distribution 
corresponding to the empirical distribution function defined in the paragraph 
preceding Theorem 15 of Chapter 12. 

The values of an empirical density belong to 


A(d-1) f {(y.,...,ya)i yi t+: + ya = 1, yi > 0 for each i}, 


called the standard (d—1)-simplex, and the De Finetti measure is a distribution 
on Ate), 


Problem 1. Let X; denote 1+ the indicator random variable of ‘heads on flip k’ 
of a sequence of n independent flips of a fair coin. Find the De Finetti measure of 
the sequence (X1, X2,...,Xn). 


* Problem 2. Let X; be the result of the roll of a fair die and X2 = 7 — Xı. Show 
that (X1, X2) is exchangeable and find its De Finetti measure. 


Problem 3. Fix an integer n € [2,6]. Consider an infinite sequence of independent 
rolls of a fair die. Let X, be the result of the first roll, and for 2 < m < n, let 
Xm be the result of the first roll that is different from Xp, 1 < k < m. Find the 
De Finetti measure of (X1,..., Xn). 


* Problem 4. Here is a way of generating an exchangeable sequence of 96 Z-valued 
random variables. For 1 < z < 12, place 8 balls labelled z in an urn and draw 
the resulting 96 balls out of the urn one at a time without replacement, noting the 
label on each ball as it is drawn. Find the De Finetti measure. 


27.1. FINITE EXCHANGEABLE SEQUENCES 535 


Problem 5. Suppose that 48 balls are drawn at random from a pile of the 96 balls 
described in the preceding exercise and that these 48 balls are then placed in an 
urn from which 48 drawings are made one at a time without replacement. Find 
the De Finetti measure for the resulting exchangeable sequence of length 48. 


* Problem 6. Four balls numbered 1, 2, 3, and 4 are placed in an urn and then 4 
balls are drawn with or without replacement according as a single independent flip 
of a coin is heads or tails. Is the random sequence of length 4 thus obtained an 
exchangeable sequence? If so, find its De Finetti measure. 


Problem 7. Modify the preceding problem by replacing the single flip by three 
independent flips, one after each of the first three draws to determine whether the 
ball drawn on that flip is to be replaced. 


Problem 8. Let (Z1,..., Zn) be a sequence of {1,...,d}-valued random variables. 
In terms of the distribution of (Z1,..., Zn), describe the De Finetti measure of the 
exchangeable sequence constructed from (Z1,..., Zn) as in Proposition 1. 


A probability measure on A? is a distribution on the set of pairs of the form 
(p,(1 — p)), 0 < p< 1. Since the second coordinate of such an ordered pair is 
determined by the first coordinate, a distribution on A! may also be regarded 
as a distribution on [0,1]. Whenever we wish to adopt this point of view, we 
indicate our desire by a change of notation, in that we replace an exchangeable 
sequence of {1,2}-valued random variables with the equivalent exchangeable 
sequence of Bernoulli (that is, {0,1}-valued) random variables obtained by sub- 
tracting 1 from each member of the sequence. We also replace the De Finetti 
measure of the original sequence by the distribution on [0,1] obtained by taking 
the first-coordinate marginal of the original De Finetti measure. We will dis- 
tinguish the viewpoint of this paragraph from that arising from Definition 2 by 
saying whether the De Finetti measure is a distribution on [0,1] or on A’, or 
alternatively whether the empirical density is [0, 1]-valued or A!-valued. Note 
that the De Finetti measure on [0, 1] equals the distribution of + 77, Xx, where 
(X,,..., Xn) is the exchangeable sequence of Bernoulli random variables. 


Problem 9. Treat Problem 1 along the lines suggested by the preceding paragraph, 
and express the De Finetti measure as a distribution on [0, 1]. 


* Problem 10. In case (Z1, Z2,...,Zn) is a sequence of standard Bernoulli random 
variables, describe the answer to Problem 8 as a distribution on [0, 1]. 


Problem 8 and Problem 10 might make one ask: What role is exchangeability 
playing in the definition of ‘empirical density’ and ‘De Finetti measure’? One 
answer, as we will soon see, is that under the assumption of exchangeability, the 
distribution of a sequence is determined by its De Finetti measure, which would 
not be the case without this assumption. 


536 27. EXCHANGEABLE SEQUENCES 


From Definition 2, it is clear that the De Finetti measure of an exchangeable 
length-n sequence of {1,...,d}$-valued random variables is supported by the set 
of points in A(4-)) having coordinates that are integral multiples of 1/n. This 
assertion and its converse constitute part of the following result. 


Theorem 3. [De Finetti (finite case, special version)] The distribution of a 
length-n exchangeable sequence of {1,...,d}-valued random variables is uniquely 
determined by its De Finettt measure, and the family of such De Finettt measures 
consists of the probability measures on the standard (d — 1)-simplex that are 
supported by the set of points whose coordinates are integral multiples of 1/n. 


PROOF. As mentioned in the paragraph preceding the theorem, all De Finetti 
measures of the exchangeable sequences of the theorem have the property de- 
scribed in the theorem. 

Let u denote a probability measure that is supported by the set of points in 
A‘¢-)) whose coordinates are integral multiples of 1 /n, and let Y be a random 
vector with distribution u. Let II be a uniformly distributed random permutation 
of {1,2,...,n} that is independent of Y. There exist {1,...,d}-valued random 
variables Z4 < Zə <---> < Zn such that 


n 


y=- Joey. 


k=1 
Set Xm = Zym) for m = 1,2,...,n. By Proposition 1, (X1,X2,...,Xn) is 
exchangeable. It is clear that Y is the empirical density of (X],..., Xn). 

For the uniqueness, let Y denote the empirical density of an exchangeable se- 
quence (X1,..., Xn). Since Y is unchanged under permutations of (Xj,..., Xn), 
in order that 

PLY = 9X7 = tm lor 1m n] 
be unchanged under such permutations it is necessary that 


P{Xm =2m forl<m<n|Y=y| 


be unchanged under these permutations whenever P[Y = y] > 0. Thus this 
conditional probability must be uniform on the set of (zm: 1 < m < n) for 


which 
1 n 
v= Yen. 
m=i 


Therefore the distribution of an exchangeable sequence (X1,..., Xn) is deter- 
mined by the distribution of its empirical density (that is, by its De Finetti 
measure). O 


The following problem shows that the construction in the preceding proof is 
equivalent to a certain urn model. 


27.1. FINITE EXCHANGEABLE SEQUENCES 537 


Problem 11. Discuss the following urn model in connection with Theorem 3. For 
1 < i < d, nY’ balls of color i are placed in an urn, and then the urn is emptied 
one ball at a time. (Here, Y‘ denotes the i*® coordinate of the random vector Y.) 


For the forthcoming collection of problems involving explicit formulas, the 
following notation is useful: 


d 
n def n! : 
Sree with n= 5 Ti. 
rı To ... Td ryiro!... rą! i 


This, the coefficient of aj! &,° ... aj? in the expansion of (a1 +ag+--:+aa)”, 
is called a multinomial coefficient. 


Problem 12. Let Y, n, and (X1,..., Xn) be as in Theorem 3 and its proof, and 
denote the it} coordinate of Y by Y*. Show that 


P[Xm = £m forl1<m<n|Y] 


(27.2) fix aya) iE Hm: rm =i} =nY!, 1 <iK<d 
0 otherwise 


(Of Diente CH ated ye 


Problem 13. Fix m > 1. Adjoin w; = țp{k < m: £k = i} for 1 < i < d to the 
notation in the preceding problem. Show that 


P|Xk = zk forl<k<m|Y] 


if 
wi < nY’, 1<i<d, 
and 
P(X; = tk for 1 <k<m]|Y]=0 
otherwise. 


Problem 14. Let u be the De Finetti measure of a length-n exchangeable sequence 
(X1,..., Xn) of {1,2,...,d}-valued random variables, denote the corresponding 
empirical density by Y, and let W; = }{k < m: X, = i}. For 1 < m < n and 
wi € ZT for which ae wi = n, prove that 


P[nY* = w; fr 1 <i<d|Xi,...,X Xm] 
— wil- wg!  _ 
= TEAR . (wa wy bY (S ale 2a) } 
SESE Lee zg 
D OPW (z1 W1)... Ga- es yee, 4) } 


if w; > Wi for 1 < i < d, and that this conditional probability equals 0 otherwise. 
The case of a 0 denominator requires a comment. 


538 27. EXCHANGEABLE SEQUENCES 


* Problem 15. Let (Xi,...,Xn) be an exchangeable sequence of Bernoulli random 
variables whose De Finetti measure is the uniform distribution on {2,+,..., 2}. 
Calculate the distribution of X1, and in case n > 1, the distribution of the pair 


(X1,X2) and the conditional distribution of + X>” _, Xm given o(X1, X2). 


m=l1 


Problem 16. Generalize Problem 15 to RÊ for a De Finetti measure that is uniform 
on the set of all members of A‘¢~!) whose coordinates are integral multiples of 1 /n. 


Problem 17. Let X = (X1, X2,...,Xn) be an exchangeable sequence of Bernoulli 
random variables, and suppose that n > 3. Calculate the distributions of X), 
(X1, X2), and (X1, X2, X3) in terms of the De Finetti measure of X. Present your 
answer in two forms depending on whether the De Finetti measure is viewed as a 
distribution on A! or on the interval [0, 1]. 


Since the first m terms, m < n, of an exchangeable sequence of length n 
constitute an exchangeable sequence of length m, it is natural to ask how the 
De Finetti measure of the shorter sequence is related to that of the longer se- 
quence. The answer is presented in the following result in which the natural 
extension of ‘martingale’ to finite sequences of R¢-valued random variables plays 
a role. Later such an extension will be used for infinite sequences. 


Proposition 4. Let p denote the De Finetti measure for a length-n exchange- 


able sequence (Xj,...,Xn) of {1,...,d}-valued random variables. Form < n, 
set 
1 m 

(27.3) Yi = ane ` ex, > 

k=1 
and denote the distribution of Ym by Vm. Then vm is the De Finetti measure of 
the exchangeable sequence (X1, X2,..., Xm) and is related to p via the formula 

Vm{ (w, saeia} 
m (n=m) 
(27.4) = 3 Oe A n aa a) H aiec] 
ZiWi z1 Zd 
zite +za=5n 


for wı,...,wq E Zt for which De wi = m. Moreover, (Yi, Y2,..., Yn) is 
a reverse martingale with respect to the reverse filtration (Gm: Mm = 1,2,...), 


where 
Gan = AT A EE Sig) . 


PROOF. By definition, Vm is the De Finetti measure of (X1, X2,..., Xm). 
Clearly, mY, < nY, coordinate by coordinate. Thus 


Um{—(wi,...,wa)} 
= 5 PUY = (wi Wwa) nY, (Aiyica pea) A E raa) 


27.2. INFINITE EXCHANGEABLE SEQUENCES 539 


The conditional probability in this formula is equal to a product one factor of 
which is the number of ways of choosing (21,..-,2m) so that (w1, ..., wa), de- 
fined in Problem 13, equals mY,, and the other factor of which is the conditional 
distribution obtained in that problem for any such (x1,...,2%m). The formula in 
the proposition follows. 

By linearity 


(27.5) X- E(ex, | Gm) = E(MYm | Gm) = MYm . 


k=1 


Because Ym, ..., Yn all remain unchanged by permutations of (X1,..., Xm), con- 
ditional exchangeability of the sequence (X1,..., Xm) given Gm follows from the 
(unconditional) exchangeability of (X1,..., Xm). Hence the terms in the sum on 
the left side of (27.5) are all equal, and each one of them equals Ym. Therefore 
for 2<m<n, 


1 m—1 
E(¥m-1 | Gm) = —— a E(ex, | Gm) = Ym, 
thus showing that (Y1, Y2,..., Yn) is a reverse martingale with respect to the 


reverse filtration (G,,G2,.--,Gn). O 


Problem 18. Let (X1,..., Xn) be an exchangeable sequence of {1,...,d}-valued 
random variables. Show that 


(Ox, Exp Ea) 


is an exchangeable sequence of multivariate Bernoulli random vectors. Restate 
some of the results of this section using multivariate Bernoulli terminology. 


27.2. Infinite exchangeable sequences 


We need a variation on the last assertion in Proposition 4. 


Lemma 5. Let X be an infinite exchangeable sequence of {1,...,d}-valued 
random variables, and for each m, define Ym by (27.3) and Gm by 


(27.6) Gr = aY Vets Ym soki ) B 


Then the infinite sequence (Y1, Y>,...) is a reverse martingale with respect to the 
reverse filtration (G1, G2,...). 


Problem 19. Prove the preceding lemma. 


540 27. EXCHANGEABLE SEQUENCES 


Definition and Proposition 6. Let X = (X1, X2,...) be an infinite ex- 


changeable sequence of {1,...,d}-valued random variables. Then 
1 n 
(27.7) Y= lim = 2 ex, 


exists almost surely. The A'¢—!)-valued random variable Y is the limiting em- 


pirical density of X, and the distribution of Y is the De Finetti measure of 
X. 


PROOF. The result is an immediate consequence of Lemma 5 and the Reverse 
Martingale Convergence Theorem. O 


Problem 20. Suppose a fair coin is flipped once. Let Xm = 2 for all m if the result 
of the flip is heads, and Xm = 1 for all m if the result is tails. Show that the 
sequence (Xm: m = 1,2,...) is exchangeable and calculate its De Finetti measure 
on Al. 


Problem 21. Suppose that a fair die is rolled, and that based on the outcome 
U, first a coin is chosen whose probability of heads is U/7 and then it is flipped 
infinitely many times. Find the De Finetti measure of the infinite sequence of coin 
flips. Express your answer as a distribution on A’ and also as a distribution on 


(0, 1). 


As an aid for its proof, the following theorem contains a more complicated 
notation for a limiting empirical density than might seem necessary. 


Theorem 7. [De Finetti (infinite case, special version)] Every probability dis- 
tribution u on the standard (d — 1)-simplex is the De Finetti distribution of an 
infinite exchangeable sequence of {1,...,d}-valued random variables whose dis- 
tribution is determined uniquely by u via the two facts that (i) it is conditionally 
tid given its limiting empirical density Yœ» and (ti) the conditional distribution 
of each term in the sequence is Y. 


PROOF. In view of Example 2 of Chapter 22, we only need prove that given 
its limiting empirical density Yə, every exchangeable sequence (X1, X2, X3...) 
is conditionally iid with common conditional distribution Y», the ¿t coordinate 
of which we denote by YŻ. 

Let Y„ denote the empirical density of the finite sequence (X1,...,.X,) and Y$ 
its i! coordinate. Fix m and a sequence (7),...,2m) of members of {1,..., d}. 
For 1 <i < d, let w; = Hk < m: £m = i}. For n > m, we obtain from 


27.2. INFINITE EXCHANGEABLE SEQUENCES 541 


Problem 13 that 
P[X, = zk for 1 < k < m | Yn] 


zil 
n 
(27.8) 7 a bes a) 


| (ava - r . “er - “a 


if nY} > w; for each i, and that otherwise this conditional probability equals 0. 
Moreover, a modification of the solution of Problem 13 shows that the left side 
of (27.8) can be replaced by 


Pg= ar ior 1S <n: Va aesa col 


and therefore by 
P(X, = Tk for 1 < k < m | Yn, Yo). 


Since Y, > Y% a.s. as n — oo, an easy limiting argument applied to the right 
side of (27.8) gives 


lim P[Xk = £p for 1 < k <m | Yp, Yoo) = [Y] ... [Y4]! 
n [o. @) 


Take the conditional expectation given Y>» of both sides and use the Conditional 
Bounded Convergence Theorem to obtain 


P[Xp € By for 1 < k < m | Yo] = [YA]... [Y4 ]” 


This formula characterizes the property of being conditionally iid given Y,, with 
common conditional density i~ Yi. O 


Problem 22. Find a single probability measure pu that can serve as a De Finetti 
measure for exchangeable sequences of lengths 2, 4, and oo, such that different 
distributions are obtained for X, in the three different settings. 


The following problem describes an explicit construction of infinite exchange- 
able sequences of Bernoulli random variables. 


Problem 23. Let u and À, respectively, denote an arbitrary probability measure 
and Lebesgue measure on [0,1]. Set 


(Q, F, P) = ((0, 1], A, p) x ©) ((0, 1], 4,d), 


where A is the Borel o-field on [0,1]. Write a typical member w of Q as w = 
(Wo,u1,W2,...), where each wm € [0,1]. For m =1,2,..., let 


if wm < wo 


if Wm > wo. 


542 


27. EXCHANGEABLE SEQUENCES 


Show that (Xm: m = 1,2,...) is an infinite exchangeable sequence of Bernoulli 
random variables whose De Finetti measure is p. 


Problem 24. Let Y denote the limiting empirical density of an infinite exchange- 
able sequence (X1, X2,...) of {1,2,...,d}-valued random variables. Fix m and set 
Wi = H{k < m: Xk = i}. Prove that the conditional distribution n» of Y given 
(X1,..., Xm) has a Radon-Nikodym derivative with respect to the De Finetti mea- 
sure u of X given by 


1 


w Wa 
Tn js Pe oea Pi 
dp frao a vee a (dq) 


where, of course, p; and q; are the it coordinates of p and q, respectively, and 
pe AU, The case of a 0 denominator requires a comment. 


Problem 25. Find the (random) vector p that maximizes the Radon-Nikodym 
derivative in the preceding problem. 


Problem 26. Let u denote the De Finetti measure of an infinite exchangeable se- 


quence (X1, X2,...) of {1,...,d}-valued random variables. For m < co, let vm 
be the De Finetti measure of the finite exchangeable sequence (X1, X2,...,Xm). 
Show that 
Vm{ 4 (w1,...,wa)} = K I E n a 
ta WE Mage, cnn, ta eee PP (dy) 
for wi,...,wq € Z* for which ae wi = m. 


Problem 27. If you have studied Chapter 14, continue the preceding problem by 
showing that vm > u as m —> œœ. 


Problem 28. At least at the intuitive level, relate Proposition 4 and Problem 26 
by letting n — oo and each z; — œ in (27.4) so that + —> yj. 


Problem 29. Let X = (X1, X2,..., Xn) be a finite exchangeable Bernoulli se- 
quence with De Finetti measure p on [0,1]. Prove that a necessary condition 
for the distribution of X to be the distribution of the first n terms of an infinite 
exchangeable sequence is that p{“**+} + p{ +} -— 2u{€} > 0 forl<w<n-1. 


27.3. Posterior distributions 


Problem 24 and to a lesser extent Problem 14 contain important results for a 
certain type of application. With respect to Problem 24, an experimenter views 
the limiting empirical density Y as the quantity of interest. From the perspective 
of the experimenter, Y is random with a distribution u based on relevant factors 
such as the scientific judgement of people with experience. The distribution p is 
called the prior distribution. In an actual real-world situation, Y is of course not 
random; it has a certain value. It may for instance, equal the probability that a 
certain asymmetrical coin comes up ‘heads’. If one were to flip this coin infinitely 


27.3. POSTERIOR DISTRIBUTIONS 543 


many times, one would, by the Strong Law of Large Numbers, learn the value of 
Y. In practice one does only finitely many such experiments, and on the basis of 
the results of these, obtains the distribution 7, of Y described in Problem 24. 
Since Nm is calculated on the basis of the experimental results, it is called a 
posterior distribution of Y. The adjective ‘Bayesian’ is attached to statisticians 
who favor the methodology: get good prior densities or distributions and then 
calculate posterior densities or distributions based on results of experiments. 

The formula obtained in Problem 14 may be regarded as a formula for the 
posterior distribution of the contents of an urn in terms of the prior distribution 
and some observations based on sampling without replacement. 


Problem 30. An urn contains 6 balls, each of which is colored red, yellow, or blue. 
Show that there are 28 possible color arrangements, including those that do not use 
all 3 colors. Suppose that the prior distribution assigns probability 1/28 to each of 
the 28 arrangements. Then three balls are drawn without replacement and one ball 
of each color is obtained. Use Problem 14 to calculate the posterior probabilities 
of the various color arrangements of the 6 balls originally in the urn. 


Problem 31. Suppose that the prior distribution of a conditionally iid Bernoulli 
sequence is standard beta with parameters a and 8, when viewed as a measure on 
[0,1]. Show that any posterior distribution is also beta, and find a formula for the 
parameters of the posterior distribution in terms of a, 2, and the observations. 


Problem 32. For the case of exchangeable Bernoulli sequences, extend the result 
of Problem 24 by describing the conditional distribution of (Y, Xm+1, Xm+2) given 


(X1,...,Xm). Use your answer to calculate the conditional expectation of Xm+1 
given (X1,..., Xm) for the case where the De Finetti measure is Lebesgue measure 
on (0, 1]. 


Problem 33. Let X = (X1,X2,...) be an infinite exchangeable sequence with 
De Finetti measure u. In terms of u and (X1,..., Xm), describe the conditional 
distribution of X given (X1,...,Xm). 


The following four problems treat infinite exchangeable Bernoulli sequences 
having De Finetti measures that are beta distributions. In this special setting 
martingales appear, in addition to the reverse martingales that arise generally. 


Problem 34. Let X = (X1, X2,...) be an infinite exchangeable Bernoulli sequence 
the De Finetti measure of which is a standard beta distribution with parameters 
a and @. For n = 0,1,2,..., let Sn = J” _, Xm and Fn = o(X1,Xe,...,Xn), 
where, according to standard conventions, So = 0 and Fo is the trivial o-field. 
Show that the sequence 


( a+ Sn 


———"—:n=0,1,2,... 
Ara roe i a ) 


is a martingale with respect to the filtration (Fn: n = 0,1,...). 


544 27. EXCHANGEABLE SEQUENCES 


Problem 35. Let Y denote the limiting empirical distribution of the exchangeable 
sequence X of the preceding problem. Find a formula for E(Y | Fn), where 
Fn = o0(X1,...,Xn). Hint: Copious calculations are not required. 


Problem 36. Suppose that œ and 8 in Problem 34 and Problem 35 are positive 
integers and relate these exercises to Polya urns. In particular, calculate the dis- 
tribution of the limiting proportion of blue balls in the urn in case the urn initially 
has one blue ball and one orange ball and one additional ball of the drawn color is 
inserted after each draw. 


Problem 37. Modify the description of Polya urns so that the preceding problem 
becomes interesting even when the positive numbers a and ĝ are not necessarily 
integers. Hint: Imagine that the urn contains a ounces of blue sand and 8 ounces 
of green sand. 


27.4. + Generalization to Borel spaces 


We set the goal of generalizing earlier concepts and results to the setting of 
exchangeable sequences of random variables taking values in a Borel space W. 
Proposition 1 asserting that finite exchangeable sequences can be obtained by 
randomly permuting the order of an arbitrary sequence of random variables is 
stated at the desired level of generalization. 

All probability measures on the finite set {1,...,d} have densities with respect 
to counting measure. For this reason we were able to focus on densities rather 
than distributions in Definition 2. Now we must focus on distributions. This 
change forces a change in the definition of ‘De Finetti measure’—a technical 
change, not just a straight generalization. 


Definition 8. Let X = (X1,..., Xn) be a finite exchangeable sequence of 
random variables taking values in a Borel space (W,G). The empirical distribu- 
tion of X is the random distribution Y defined by 


1 n 
Y = — 
(B)=-9 IsoXr, BEG, 
k=1 
where Ig denotes the indicator function of B. The De Finetti measure of X is 


the distribution of Y. 


Here the De Finetti measure is a distribution on the space Q of distributions 
on W, whereas in Definition 2 it is a distribution on the space of densities on 
Y = {1,...,d}, viewed as members of A471), 


Problem 38. Find the De Finetti measure of an iid sequence of 2 random variables 
uniformly distributed on [0, 1]. 


27.4. GENERALIZATION TO BOREL SPACES 545 


* Problem 39. For n = 2 apply Proposition 1 in case Zı and Z2 are independent 
standard exponentially distributed random variables with means 1 and 2, respec- 
tively. For the corresponding exchangeable sequence (X1, X2), calculate the den- 
sities of X1, Xo, and (X1, X2). Also calculate the De Finetti measure. 


Problem 40. On intuitive grounds guess whether the correlation of X; and X2 in 
the preceding problem is positive, negative, or 0. Then calculate the correlation. 


Here is our desired generalization of Theorem 3. 


Theorem 9. [De Finetti (finite case)] Let Q denote the space of probability 
measures on a Borel space Ù. The distribution of a length-n exchangeable se- 
quence of V-valued random variables is uniquely determined by its De Finetti 
measure, and the family of such De Finetti measures consists of the probability 
measures on Q that assign probability 1 to the set of distributions of the form 


1 n 
(27.9) = yas 
k=1 
where 6, denotes the delta distribution at z. 


Much of the proof of Theorem 3 carries over to form a portion of the proof 
of Theorem 9. However, the uniqueness aspect of the proof of Theorem 3 does 
not in itself establish the uniqueness asserted in Theorem 9. What it does do is 
give enough information so that the following lemma applies to give the desired 
uniqueness. 


Lemma 10. Let U and V be two finite or infinite equally long sequences of 
random variables having values in a Borel space V. Let n be no larger than 
the length of the sequences. Suppose that for every finite measurable partition 
(Cl,...,Ca) of F, 


(27.10) P[Um€ Bm for 1 <m < n] = P|Vm E€ Bm forl<m<nl 


for all choices of Bm € {Ci,...,Ca}. Then the distributions of (U;,...,Un) and 
(Vi,...,Vn) are identical. If U and V are infinite sequences and (27.10) holds 
for every n, then U and V have the same distribution. 


Problem 41. Prove the preceding lemma. 


Problem 42. Check the assertion made in the paragraph preceding Lemma 10 that 
the proof of Theorem 9 is now essentially complete. 


Problem 43. Create an urn model relevant for Theorem 9. 


For treating infinite sequences we need a variation of Lemma 5. 


546 27. EXCHANGEABLE SEQUENCES 


Lemma 11. Let X be an infinite exchangeable sequence of random variables 
taking values in a Borel space Y, and for each m, define Ym by 


1 m 
Ym(B) = X "Ip o Xx 
k=1 


and Gm by 
Gm = ON es Yt iy ee ae 


Then for each measurable B, the sequence (Yı (B), Yo(B),...) is a reverse mar- 
tingale with respect to the reverse filtration (G1, G2,...). 


Problem 44. Prove the preceding lemma, taking care that your proof uses the 
specified filtration, not just the minimal filtration of (Yı (B), Y2(B),...). 


As for the special case of {1,...,d}-valued random variables, the general 
definition of ‘De Finetti measure’ for infinite exchangeable sequences is based 
on a fact that must be proved. 


Definition and Proposition 12. Let (Y,G) be a Borel space and X an 
infinite exchangeable sequence of Y-valued random variables. Then there exists 
a random distribution Y on (W,G) such that for each B € G, 


1 n 
(27.11) Y(B)= lim — $ Igo Xm 
noo n 
m= 
almost surely. The distribution of Y is the De Finetti measure of X, and Y itself 
is the limiting empirical distribution of X 


PARTIAL PROOF. We assume that Y = R, without loss of generality. (Why 
can we do this?) Let Ym and Gm be as in Lemma 11, and set Ga = I- Gin. 
Fix a measurable set B. By Lemma 11 and the Reverse Martingale Convergence 
Theorem, the sequence (Y,,(B): m > 1) converges almost surely to a random 
variable Y,,.(B), and 


(27.12) Yoo(B) = E(¥n(B) | Goo) a.s., 


for all m. By working with Y,,((—0o,y]) for y rational and then using limits 
from the right we obtain a random distribution function. Let Y denote the 
corresponding random distribution. 

We leave it for the reader to prove that the set {(w,y): Y(w){y} > O} isa 
measurable subset of Q x R, where Q denotes the underlying probability space. 
By the Fubini Theorem the integral of its indicator function with respect to Px A, 
where A denotes Lebesgue measure, can be calculated via iterated integrals in 
either order. Integration first with respect to À gives 0 for every w € Q. Then 
integration with respect to P gives 0 as the value of the iterated integral. Hence, 
when the iteration is done in the other order, it must be that for almost every 


27.4. GENERALIZATION TO BOREL SPACES 547 


y € R, the integral with respect to P must equal 0. Therefore, there is a 
(nonrandom) dense subset D of R such that Y {y} = 0 a.s. for y € D. It is clear 
that Y..(—oo, y] = Y (—œ, y] a.s. for y € D. Also, Y(R) = Y (R). To show that 


(27.13) Y» (B) = Y (B) as. 


for each Borel B C R and thereby finish the proof, we only need show that the 
collection of B for which (27.13) holds is a Sierpiński class, for then the Sierpiński 
Class Theorem applies. 

The collection of B for which (27.13) holds is clearly closed under proper 
differences. That it is closed under increasing limits follows from (27.12) and the 
Conditional Monotone Convergence Theorem. O 


Problem 45. Complete the proof of Proposition 6 by showing the measurability of 
{(w,y): ¥(w){y} > 0}. 


Problem 46. Show that the conclusion of Proposition 6 cannot be strengthened to 
the assertion that almost surely, (27.11) holds for every Borel set B. 


Theorem 13. [De Finetti (infinite case)] Let Q denote the space of probabil- 
ity measures on a Borel space VW. The distribution of an infinite exchangeable 
sequence of V-valued random variables is uniquely determined by its De Finetti 
measure via the two facts that it is (i) conditionally iid given its limiting em- 
pirical distribution Y and (ii) the conditional distribution of each term in the 
sequence is Y. The family of such De Finetti measures consists of all probability 
measures on Q. 


PROOF. In view of Example 2 of Chapter 22, we only need prove that each 
exchangeable sequence (X1, X2, X3...) is conditionally iid given its limiting em- 
pirical distribution Y and that the common conditional distribution is Y. This 
conclusion follows from Theorem 7 and Lemma 10. O 


Example 1. Denote by yp the distribution of the random distribution Y de- 
fined by 


Y(w){i} = (1—-V(w))V4(w), 1=1,2,..., 


where V is uniformly distributed on the interval (0,1). Let us study an infinite 
exchangeable sequence (X1, X2,...) having De Finetti measure p. 
For v € [0,1] and positive integers 71, 72,...,2m we calculate 


PIV <v, Xı Sewa hn = Ua] 


(27.14) = f iG = owe] eS fa —u)ue "du, 
0 0 


k=1 


548 27. EXCHANGEABLE SEQUENCES 


where s = Seed £k- By using our knowledge of the beta distribution, we can 
calculate the integral explicitly in case v = 1 to obtain 


m!(s —m)! 

PS Sa aA Sey | = Aeee 
Given X,,...,Xm, we are interested in the conditional distribution of Y, 
which we might view as an unknown distribution, in which case X),..., Xm are 


m ‘independent’ observations based on the unknown distribution Y. Since Y 
is determined by the R-valued random variable V, we focus on the conditional 
distribution of V given X1,..., Xm: 


PV Ni a ed eh ee | 

PV < Xı = D E = a 
E et elpaenty Tm] PX = Divert = | 

From (27.14) we see that the conditional distribution of V given (X1,..., Xm) 

is standard beta with parameters 1 + m and 1- m + J p- Xx. 


* Problem 47. Mimic the preceding example in case the values of Y are exponential 
distributions having support [0, oo), with the reciprocal of the random mean being 
itself exponentially distributed with mean 1. 


Problem 48. For the preceding problem and for Example 1, calculate the (uncon- 
ditional) expectation of X;. Comment on your answers. 


Problem 49. Modify Example 1 by replacing the uniform distribution on V by 
an arbitrary, but fixed, standard beta distribution. For which values of the beta 
parameters does X; have finite (unconditional) expectation? What about finite 
(unconditional) variance? 


Problem 50. Modify Problem 47 by replacing the exponential distribution for the 
reciprocal of the (random) mean by an arbitrary standard gamma distribution 
having support [0,0o). For which values of the gamma parameter does X, have 
finite expectation? What about finite variance? 


Problem 51. Mimic Example 1 in case the values of Y are beta distributions having 
support [0,1] and density there with respect to Lebesgue measure À given by: 


dY (w) oe F(U (w) + V@)) MORT _ gyVw)=1 
dA Uw) @)) ! 


where U + V is exponentially distributed on [0,00) with mean 1, the conditional 
distribution of U given U + V is uniform on the interval (0,U + V), and I denotes 
the gamma function. 


27.4. GENERALIZATION TO BOREL SPACES 549 


Example 2. [Broken stick as probability distribution] Let V = (Vi, V2,...) 
be an independent sequence of random variables each of which is uniformly 
distributed on the interval (0,1). Let Y be the random measure defined by 


i—1 
Yosri U= eS 125 
j=l 


where, as is usual, an empty product equals 1. Mathematical induction shows 
that 


Y({1,2,...,7}) =1- ]Ja-%), 


which approaches 1 a.s. as 1 + oo. Thus, with probability 1, Y is a probability 
measure on the set of positive integers, a measure closely related to the stick- 
breaking random walk of Problem 17 in Chapter 12. 

Let (Xm: m > 1) be an infinite exchangeable sequence with De Finetti mea- 
sure p equal to the distribution of Y. Even in this situation where the support of 
u is large, we can make explicit calculations. For positive integers z1, 22,...,%m, 
we set wi = #{k < m: zk = i} for i =1,2,... and calculate 


P| Xp Se tor) = k=) 


a (II [Var {o -v)]) = (II Vie ea 


i=1 


od 1 CO any | S A 
= I] i, uvi (1 = y) wit twitat.) au) = II Wi. (2o wj)! 
) 0 =I | 


1 


(mF a TEE eg wy * 
where all but finitely many of the factors in the infinite product equal 1. 

Since Y is defined in terms of V we are interested in the conditional distri- 
bution function of V given (X1, X2,..., Xm). The preceding calculation gives 
the denominator of the conditional distribution function, and the numerator is 
obtained from the preceding calculation by replacing the interval [0,1] of inte- 
gration on u by intervals of the form [0,v]. Thus, the random sequence V is 
conditionally independent given (X1,..., Xm) and the conditional distribution 


of the o” member of V is a standard beta distribution with parameters 1 + W; 
and 1+}; i1 Wj, where Wi(w) = #{k < m: Xi (w) = i}. 


* Problem 52. Give an intuitive explanation of the conditional independence of the 


sequence V described in the preceding example. 


550 27. EXCHANGEABLE SEQUENCES 


Problem 53. [GEM distributions] Modify Example 2 by replacing the (uncondi- 
tional) uniform distribution of each V, by a beta distribution with density 


v~ p(l) T, O<v<l, 


where @ is a fixed positive constant. The distribution of the random measure Y is 
called the GEM distribution with parameter 6. 


27.5. į Ferguson distributions and Blackwell-MacQueen urns 


Suppose that the De Finetti measure for an infinite exchangeable sequence of 
Bernoulli random variables is a beta distribution. We saw in Problem 31 that 
the posterior distributions of the Bernoulli parameter are also beta distribu- 
tions. For this reason beta distributions are important De Finetti measures. As 
the following example shows, Dirichlet distributions have a similar property for 
exchangeable sequences of {1,2,...,d}-valued random variables. 


Proposition 14. Let (X1, X2, X3,...) be an infinite exchangeable sequence 


of {1,...,d}-valued random variables whose De Finetti measure is Dirichlet with 
parameters yi, 1 <i < d, when regarded as a distribution on the standard (d—1)- 
simplex. For members 21,...,2m of {1,...,d}, set wi = t{k < m: a, = i}. 
Then 


d 
d T? 
2e aa 
where (a)} denotes the rising factorial Tho (a +c). Also, the posterior distri- 


bution of Y based on m observations is Dirichlet with parameters (yi + Wi), 
1<i<d, where W; = Hk <m: X_, =i}. 


Pi Xr = a2, for 1 <k<mj= 


Problem 54. Prove the preceding proposition. Hint: Use Problem 24 for part of 
the proof. 


We have seen in Problem 37 that an infinite exchangeable sequence of Bernoulli 
random variables having a specified beta distribution for its De Finetti measure 
can be constructed via a ‘generalized Polya urn model’ in which it is not required 
that the initial contents be described by integers. The following example extends 
this construction to accommodate Dirichlet distributions. 


Example 3. [Blackwell-MacQueen urns, special case] Consider an urn con- 
taining sands of different colors: y; units of color 1, 1 < i < d, where each y; > 0. 
We imagine that the ‘grains’ of sand are infinitesimal in size so that there is no 
divisibility requirement on the numbers 7;. At each time m = 1,2,..., a grain 
of sand is drawn, the probability of each color being proportional to amount of 
sand of that color, and then it is returned to the urn (actually not relevant since 
it is infinitesimal in size) and one additional unit of sand of its color is poured 


27.5. FERGUSON DISTRIBUTIONS AND BLACKWELL-MACQUEEN URNS 551 


into the urn. Thus, after the mt? stage, the amount of sand in the urn equals 
m+ ya ~i. Form = 1,2,..., let Xm denote the color— 1,2,..., or d —drawn 
at time m. An induction argument shows that 


TERGAN 


P{(X),...,Xm) = (a1,.-.,0m)] = l 
ee ae 


where w; = t{m < k: am =i}. 

By comparing this formula with Proposition 14, we see that every Dirichlet 
distribution can be viewed as the De Finetti measure of an exchangeable sequence 
arising from an appropriate Blackwell-MacQueen urn. 


* Problem 55. Carry out the induction mentioned in the preceding example. 


Problem 56. Consider a Blackwell-MacQueen urn with initial contents being three 
units of red sand, one unit of brown sand, and one unit of gray sand. Calculate 
the probability that in the long run the contents of the urn is more than 50% red. 


Problem 57. As y N 0 the gamma distribution with parameter y and scaling 
parameter a (not depending on y), approaches the delta distribution at 0. Thus 
the delta distribution at 0 may be called a gamma distribution with parameter 0. 
Use this fact to generalize the definition of ‘Dirichlet distribution’ to accommodate 
parameters (71,...,7ya) in which some but not all of the coordinates might equal 0. 
Then comment on corresponding generalizations of Proposition 14 and Example 3. 


We will continue the process of generalization by moving from exchangeable 
sequences of random variables having values in a finite set to those having values 
in an arbitrary Borel space. As we generalize we want to focus on De Finetti 
measures having the nice properties that Dirichlet distributions have—namely 
those described in Proposition 14 of this chapter and Problem 31 of Chapter 10. 
We use the indirect route of first generalizing a model rather than a definition. 

To generalize Example 3 we replace the set {1,...,d} of colors by a Borel 
space W of colors, and the vector (y1,-...,7q) describing the color of the initial 
contents of the urn by a nonzero (meaning not identically zero) finite measure 
€ on Y. The probability that the color of the first grain drawn from the urn 
belongs to a Borel set B is H. As in the above example a unit amount of 
sand of the same color as the drawn grain is added to the urn at each time 
m = 1,2,.... The preceding sentences describe the Blackwell-MacQueen urn 
with initial measure £. 


Definition and Proposition 15. Let € be a nonzero finite measure on a 
Borel space Y, and denote by Q(W) the Borel space of distributions on Y. The 
sequence of colors drawn from a Blackwell-MacQueen urn with initial measure £ 
is an exchangeable sequence, the De Finetti measure yz of which is the Ferguson 
distribution with parameter €. The measure p is the unique probability measure 


552 27. EXCHANGEABLE SEQUENCES 


on QO(W) that, for every finite measurable partition (B,,..., Ba) of Y, induces the 
Dirichlet distribution on Rf with parameter (€(B1),...,€(Ba)), via the mapping 
y ~ (y(Bi),--.,y(Ba)) from Q(¥) to RY. 


PROOF. Consider the Blackwell-MacQueen urn with initial measure €. By re- 
naming all the colors in B, as r, 1 < r < d, we obtain a Blackwell-MacQueen urn 
as described in Example 3 with initial parameter (€(B,),...,€(Ba)). From that 
example we see that the sequence of drawn colors from the Blackwell-MacQueen 
urn with initial measure € is exchangeable and that the corresponding De Finetti 
measure induces the desired Dirichlet distribution. 

For the proof of uniqueness, let À be a probability measure on Q(Ẹ) that 
induces the Dirichlet distribution with parameter (€(B,),...,€(Bg)) on RÊ via 
the mapping y ~ (y(B,),...,y(Ba)) from Q(¥) to R?. We will prove that 
À = u by proving that corresponding exchangeable sequences (X)1,X),2,.--) 
(not necessarily arising from a Blackwell-MacQueen urn) and (X,1,Xy2,.--) 
have the same distribution. Fix a measurable partition (B1,..., Ba); for any 
finite sequence (A;,..., Ag) where each Am equals some B,, we have 


P[Xym€Am,1<m<k)=P[Xym€ Am, 1<m<kl. 


From this it follows that the distributions of the two exchangeable sequences are 
identical and thus, by the definition of De Finetti measure, that A= p. O 


Problem 58. Prove that with probability 1, infinitely many different colors are 
drawn from any Blackwell-MacQueen urn whose initial measure € has the property 
that there exists an infinite partition {B;: j = 1,2,...} such that €(B;) > 0 for 
every 7. 


Problem 59. For a Blackwell-MacQueen urn with initial measure € on a Borel 
space W, let Em, for m = 0,1,2,..., denote the measure describing the color 
distribution of the sand in the urn immediately after time m. [Notice that =o = € 
and En(W) = m+ €(¥).] The description of the urn makes it clear that the 
random sequence (Zm: m > 0) is a time-homogeneous Markov sequence in the 
space of finite measures on VW. Find the corresponding transition distributions. 


Problem 60. For a sequence (Em: m > 0) of random measures constructed on Y 
as in the preceding problem and any measurable B C W, show that 
Em(B) ) 
— :m>0 
( Em(V) 
is a martingale, the limit of which has a standard beta distribution with parameters 
E(B) and €(W) — €(B). Relate this problem to Problem 56. 


CHAPTER 28 
Stationary Sequences 


Many of the random sequences studied so far in this book are related in some 
significant way to ‘stationary sequences’. Informally speaking, a random se- 
quence is stationary if it models the successive states of some system that is in 
equilibrium. Thus, an important example of a stationary sequence is a Markov 
sequence whose initial distribution is an equilibrium distribution. Exchangeable 
sequences are also stationary. And there are many other important examples, 
some of which will be introduced in this chapter. 

After providing basic definitions and examples in the first three sections, we 
turn our attention in the fourth section to the Birkhoff Ergodic Theorem, which 
is a remarkable generalization of the Strong Law of Large Numbers to sequences 
with stationary (but not necessarily independent) increments. Section 5 intro- 
duces the concept of ‘ergodicity’: a stationary sequence is ergodic if its distribu- 
tion cannot be decomposed into a nontrivial mixture of distributions of stationary 
sequences. Important criteria for ergodicity are given. In Section 6, we improve 
the Strong Law of Large Numbers even further with a very useful result known 
as the Kingman-Liggett Subadditive Ergodic Theorem. In an entirely different 
direction, in the final section of the chapter, we take an introductory look at the 
‘spectral analysis’ of stationary sequences. 


28.1. Definitions 


A random sequence X = (Xo, X1, X2,...) is stationary if it has the same dis- 
tribution as the random sequence (X1, X2,-X3,...). We may also have doubly 
infinite stationary sequences: a random sequence (..., X_,, Xo, X1,...) is sta- 
tionary if the distribution of the sequence (Xk, Xk+1, Xx42,---) does not depend 
on k € Z. Sometimes, in order to distinguish these two types of stationary se- 
quences, we use the term one-sided for a sequence X = (Xo, X1, X2,...) and the 
term two-sided for a sequence (...,X_1, Xo, X1,.-..). For simplicity, we mostly 
concern ourselves with one-sided stationary sequences X whose terms are (WV, G)- 


554 28. STATIONARY SEQUENCES 


valued for some Borel space (Y, G). Thus X itself is (©, H)-valued, where 


(28.1) (O,H) = [[(%.9) =(¥,9)®. 


n=0 


Define the shift transformation T: O > © by 
T((wo, U1, We,---)) aa (Yi, 2, W3,-.-). 


We use the notation 7% to denote the k-fold composition of r with itself, with 
T? denoting the identity function. Thus, for example, 


7” (to, Yi, We, os isa) = (W3, Wa, Ws, bs s] ; 


In order that a random sequence X of Y-valued random variables be stationary, it 
is obviously necessary and sufficient that X and ro X have the same distribution. 
If Q is the distribution of such a stationary sequence X, then 7 is a measure- 
preserving transformation on the probability space (©, H, Q) in the sense that 


Q(771(A)) = Q(A) for all ACH. 


Thus stationary sequences can be treated by studying ‘ergodic theory’, the the- 
ory of measure-preserving transformations. The following problem shows also 
that questions in ergodic theory can be viewed as questions about stationary 
sequences. 


Problem 1. Let (Q, F, P) be an arbitrary probability space, and let 7: Q 4 2 be 
a measurable function. Let Xo be the identity function on Q, and for n = 1,2,..., 
define random variables X,, inductively by setting Xn = T o Xn, -1. Show that the 
sequence X = (Xo, X1, X2,...) is stationary if and only if r is measure-preserving. 


Problem 2. For the particular case of the preceding problem in which (Q, F) equals 
(O,#) as defined in (28.1) and 7 equals the shift transformation, write an explicit 
formula for each Xn in terms of the coordinates of members of ©. 


Problem 3. Show that a one-sided sequence X is stationary if and only if for 


all nonnegative integers k and n, (Xo, X1,..., Xx) has the same distribution as 
(Xn, Xn41,---,Xn4kr). Use this fact to prove that every exchangeable sequence is 
stationary. 


* Problem 4. Let (Xo, Xı,...) be a stationary sequence, and fix integers m > 0 and 
k > 0. Prove that the sequence (Xm, Xm+k,Xm+2k,.-..) is a Stationary sequence. 


Problem 5. Let X be a one-sided stationary sequence of Y-valued random vari- 
ables, r the shift transformation on © = V™, and g: O > Y a measurable function. 
Show that the sequence 


(Go X,Gorox gor oX sss) 


is stationary. 


28.2. NOTATION 555 


* Problem 6. Let (Y,G) be a Borel space. Show that if X = (Xo, X1, X2,...) isa 
one-sided stationary sequence of Ņ-valued random variables, then there exists a 
two-sided stationary sequence Y = (..., ¥-1, Yo, %1,...) such that X has the same 
distribution as the one-sided sequence (Yo, Y1, Y2,...). 


28.2. Notation 


Throughout this chapter, it will be useful to agree on some standard notation 
and assumptions. The space W in which the terms of random sequences typically 
take their values is a Borel space, and the corresponding o-field, which will 
often not be mentioned, is denoted by G. The Borel space © and o-field H are 
defined by (28.1), with H often omitted. The symbol 7 is reserved for the shift 
transformation on ©. With respect to a probability measure Q on ©, T may 
or may not be measure-preserving. Correspondingly, we say that Q is or is not 
shift-invariant. We denote by M the collection of all shift-invariant probability 
measures on ©, and a member of M will often be called a distribution since 
it may be the distribution of a stationary sequence or, as in Problem 2, the 
distribution of the first term of a stationary sequence. 

The o-field 

S={AEH: A=r ‘(A)}, 
is called the shift-invariant o-field of H, with the phrase “of H” often being 
dropped. 

There is another sub-o-field of H that is useful in the context of stationary 
sequences. Even though r might not be one-to-one, 7~' may be viewed as a 
function from H to H. As such it can be composed with itself. For positive 
integers n, let T~” denote n-fold composition of r~! with itself. Set 


Hn = {r-"(A): A € H}, 
and m 
TS e 
n=l] 


We call 7 the tail o-field of H. The reader may recall that our definition of tail 
o-field in Chapter 12 was more general than the one given here. In this chapter, 
the tail o-field will always be defined in terms of the o-fields Hn as above, and not 
in terms of other natural sequences of o-fields, such as (Xn, Xn41,...). Also 
note that we have not bothered to define the tail o-field for the two-sided case. 
In that context, our definition of Hn is inappropriate, since it gives Hn = H. 
Nevertheless, the reader who is interested in two-sided sequences should have no 
difficulty in making appropriate modifications. 

Example 3 of the next section shows that the invariant and tail o-fields are 
not necessarily the same. But there is a relation between them as indicated in 
the following problem. 


556 28. STATIONARY SEQUENCES 


Problem 7. Prove that S C T. 


Problem 8. Let X be a stationary sequence taking values in some Borel space. 
Denote the conditional distribution of X given the o-field X~'(S) by w ~ R”. 
Prove that for P-almost every w, R” is the distribution of a stationary sequence. 


Problem 9. Let X be an R-valued stationary sequence and A a member of X~1(S). 
Show that the sequence (X,J4: n = 0,1,2,...) is stationary. 


Problem 10. Show that the set M of all shift-invariant measures is a convex set of 
measures. That is, show that if R, S € M, then for any t € [0,1], tR+(1-t)S € M. 


28.3. Examples 


We have already seen that exchangeable sequences (including, of course, iid se- 
quences) are stationary. The following example uses such sequences to illustrate 
some of the concepts of the preceding section. 


Example 1. Let X be an exchangeable sequence in a Borel space (W,G). Let 
Y denote its limiting empirical distribution (see Chapter 27), and set 


Rw,-)=&Y,)). 
k=0 


Our goal is to show that R is a version of the conditional distribution introduced 
in Problem 8. By Proposition 8 of Chapter 27, R is X~'(S)-measurable. 
To complete the argument we need to show 


(28.2) E(R(B)Ic) = P([X € B]NC), 


for B € H and C € X~1(S). Write C = X7~'(A) for A € S. The desired relation 
(28.2) can be written as 


(28.3) E(R(B)I4(X)) = PIX € BN A]. 
The left side of (28.3) equals 

E(E(R(B)I4(X) | Y)) = E(R(B)E(I4(X) | Y)) 
(28.4) = E(R(B) R(A)) 


By the Kolmogorov 0-1 Law, R(w,-), being a product measure on © for each 
w, assigns probability 0 or 1 to each member of T. By Problem 7, A € 7, and 
therefore for every w, R(w, A) = 0 or = 1. Thus 

R(w, A) R(w, B) = R(w, AN B) 


for all w. This equality enables us to conclude that (28.4) equals P(X € BNA), 
as desired. 


28.3. EXAMPLES 557 


Our next example indicates which Markov sequences are stationary. 


Example 2. Let X = (Xo, X1, X2,...) be a Markov sequence with transition 
operator T and initial distribution Qo. If Qo is an equilibrium distribution for 
T, then the sequence Y = (X1, X2, X3,...) has the same initial distribution 
and transition operator as X, so it has the same distribution as X. Thus, X 
is a stationary sequence. We will sometimes use the term stationary Markov 
sequence to describe a sequence like X. 


Example 3. [Rotations of the circle] Let Xo be a random variable that is 
uniformly distributed on [0,1), and let a be an arbitrary real number. Set 
Xn = an + Xo (mod 1),n € Zt. Then the sequence X = (Xo, Xi, X2,...) 
is stationary. It is not hard to prove this fact directly, but we will prove it using 
some of the theory that has been developed so far. 

Clearly X is a Markov sequence with state space [0, 1) (the transition operator 
takes a function z ~ f(x) to the function z ~ f(x+a (mod 1))). It is also easy 
to see that the uniform distribution on [0, 1) is an equilibrium distribution, using 
the translation invariance of Lebesgue measure. Since Xo has this distribution, 
X is a stationary sequence by Example 2. 

Any sequence distributed like X is called a stationary rotation of the circle. 
The quantity 2ra is known as the rotation angle. 

The fact that r ~ «+a (mod 1) is a one-to-one function having a measurable 
inverse implies that the tail o-field 7 equals the o-field H of measurable sets of 
infinite sequences of members of [0, 1). The character of the shift-invariant o-field 
S depends on the rotation angle 27a. If ais rational, let q denote its denominator 
when written in lowest terms with positive denominator. The set [0,1) viewed 
as a circle is the union of equivalence classes each of which contains q equally 
spaced members. In order for a measurable subset of [0, 1) to belong to X~!(S) 
it is necessary and sufficient that it contain all members of an equivalence class 
whenever it contains any one of them. A similar statement is true in case a is 
irrational, except that each equivalence class is a countable set dense in [0, 1). 


Example 4. Let Xo be as in the preceding example, and let Xp = 2Xn-1 = 
2” Xo (mod 1) for n = 1,2,.... In this example as in the preceding one, the 
entire sequence X is determined by its first term Xo. However, this sequence is 
unlike the preceding example, in that the knowledge of the sequence for large 
n does not determine Xo. In fact, we will argue that in this example, the tail 
o-field is very small; that each of its members has probability 0 or 1. 

By writing each member of [0,1) in binary, we can view X as an infinite 
sequence, each of whose members is itself an infinite sequence of O’s and 1’s. 
The distribution of the infinite sequence Xo is the fair-coin-flip measure, and X, 
is obtained from X,,_; by discarding the first term of X,—, and then subtracting 


558 28. STATIONARY SEQUENCES 


1 from the indices on the remaining terms. Clearly X, does not determine Xo 
ifn > 1. 


Example 5. [Stationary Gaussian sequences] Let U = (U,,U2,...) be an 
iid sequence of random variables uniformly distributed on [0,27), and let Z = 
(Zi, Z2,...) be an iid sequence of random variables with the standard normal 
distribution. Assume that the pair (U, Z) is independent. Also, let (a,,a2,...) 
and (b;,b2,...) be two sequences of real constants. Set 


CO 
Xn = X br Zp cos(nax + Us), neZ. 

k=1 
By the Kolmogorov Three-Series Theorem, this sum converges a.s. if and only if 
ae b2 < oo, in which case the sequence X = (Xo, X1, X2,...) is a.s. well-defined. 
The reader is asked in Problem 14 to show that the sequence X is stationary 
and ‘Gaussian’. (A random sequence of R-valued random variables is Gaussian 
if all of its finite-dimensional marginal distributions are normal distributions.) 
More will be said about Gaussian sequences at the end of this chapter. 


Problem 11. [Moving averages] Let X = (Xo, X1, X2,...) be a stationary sequence 
of R-valued random variables. Fix a positive integer k, and let 


ee Nee a nee ee 
Ya = SEE att 


Show that Y = (Yo, Yı, Yo,...) is stationary. 


ey le 


Problem 12. Let X and Y be stationary sequences of R-valued random variables. 
Show that if X and Y are independent, then the sequence X +Y is stationary. Also, 
show by example that the independence assumption cannot be entirely eliminated. 


Problem 13. [Stationary renewal sequences] Show that the delayed renewal se- 
quence defined in Theorem 15 of Chapter 25 is a stationary sequence. 


Problem 14. Show that the sequence X defined in Example 5 is stationary and 
Gaussian. Hint: Example 2, Example 3, and Problem 12 are all helpful in this 
argument. 


28.4. The Birkhoff Ergodic Theorem 
This section is devoted to the following result. 


Theorem 1. [Birkhoff Ergodic] Let X = (Xn: n =0,1,2,...) be an R-valued 
stationary sequence, and let Sn = Xo + Xi + +--+ Xn-1. If E(Xo | X71(S)) is 
a.s.-defined, then 


(28.5) lim Sn = E(Xo | X7'(S)) a.s., 


n> Tl 


28.4. THE BIRKHOFF ERGODIC THEOREM 559 


and if E(|Xo|) < œ, then 


n> 00 


(28.6) lim (|= + E(Xo | X >) ) =0. 


PROOF. We first treat the special case in which E(Xo | X~!(S)) = 0 as. and 
E( Xo) exists, in which case E( XQ) necessarily equals 0. For this case our goal 
is to prove that as n > œ, S,/n > 0 as. and in Lj. 

Fix £ > 0 and set 


A= fo: lim sup 2) >e). 


n= OO 


We will prove that P(A) = 0. Once we prove this fact, the proof of the desired 
almost sure convergence is completed by applying the same argument to the 


sequence —X. 
Note that A € XT?! (S), so E(XoI4) = E(E(Xo | X~1(S))I4) = 0. Thus, 


E((Xo — €Ma) = -eP (A). 
Therefore, in order to show that P(A) = 0, it suffices to prove that 
(28.7) E((Xo — E€M4) > 0. 
For n = 1,2,3,..., set S = Sn+1 — Xo. Let 


Mn = max{ (S1 — €), (S2 — 2e),..., (Sn —ne)}, 
My, = max{ (S; — €), (S3 — 2e),..., (Sh — ne) }. 


By the definition of S/, 
(Xo —£€)+ (0V M) > Mn, n=0,1,2,.... 
Since Xo = S4, it follows that 


Kise z Ma ONM 


sO 
E((Xo ~2)Ia) > E((Mn = (0 V Mj)Ma) 
= E([(0 V Mn) — (0 V Mj )|I4) + E([Mn — (0 V M,)]La), 
for n = 0,1,2,.... Since X is stationary and A € X~1!(S), the random variables 


(OVM,,)I4 and (0V M,a have the same distribution. Thus, the first expectation 
on the right side of the preceding inequality is 0, so that inequality becomes 


(28.8) E((Xo—e)l4) > E((Mn - (0V Mn)Ma), n=0,1,2,.... 
By the definitions of Mn and A, lim, M,(w) > 0 for all w € A, so 


(28.9) lim[Mn — (0 V Mn)I4] = 0. 


560 28. STATIONARY SEQUENCES 


By the definition of Mn, (Xo —€) < Mn, so |Xo — €| > |Mn — (0V M,,)|. Since we 
have assumed that E(|Xo|) < oo, the Dominated Convergence Theorem, (28.8), 
and (28.9) imply that 


E((Xo—e)Ia) > lim E((Mn — (0V My))Ia) =0, 


proving (28.7). 

For the special case thus far treated, the proof of (28.6) is essentially the same 
as the proof of the corresponding fact in the Strong Law of Large Numbers (see 
the proof of Lemma 13 of Chapter 12). 

Next we drop the assumption that E(Xo | X7~'(S)) = 0 a.s., but continue 
to assume E(|X9|) < œ. Thus E(Xə | X71(S)) is finite a.s. The sequence 
(Xn — E(Xo | X71(S)): n = 0,1,2,...), which is the same as the sequence 
(Xn — E(X, | X71(S)): n = 0,1,2,...), is stationary. Apply what has already 
been proved to this stationary sequence to obtain (28.5) and (28.6) whenever 
E(|Xo|) < co. 

Finally we drop the assumption that E(|Xo|) < œ, assuming instead only 
that E(Xo | X7~1(S)) is as. defined. To finish the proof we only need prove 
(28.5) in this situation. This is easily done by treating the stationary sequences 
(XT Acy:n =0,1,...) and (X7 Ace: n =0,1,...) and letting cı + œ and 
C& > œ. O 


Problem 15. [Maximal Ergodic Lemma] Let X be an R-valued stationary sequence 
such that E(|Xo|) < co. Let Sn = Xo+---+Xn-1 and Mn = max{ S1, S2,..., Sn}. 
Prove that 


E(Xo; [Mn > 0] >0. 
Hint: Use ideas from the proof of the Birkhoff Ergodic Theorem. 


Problem 16. Let X be a stationary sequence of (W,G)-valued random variables. 
Fix a measurable set A € G, and define 


T = 0 forn = 0 
= inf{k > Tn-1: Xk E A} forn> 0. 


Show that 
1 
lim 


Te 
asan PEAX C) 


where it is understood that 1/0 = oo. 


28.5. ERGODICITY 561 


28.5. Ergodicity 


In Chapter 12 we saw examples of o-fields containing events other than @ and 
Q but which nevertheless have the property that each member is an event of 
probability 0 or 1. We will call such o-fields 0-1 trivial. Of course, the trivial 
o-field is also 0-1 trivial. 

It is apparent from the statement and proof of the Birkhoff Ergodic Theorem 
that the shift-invariant o-field S plays an important role for stationary sequences, 
somewhat reminiscent of the role of the tail o-field in the study of iid sequences. 
However, for general stationary sequences, there is no 0-1 law for S, that is, S 
is not necessarily a 0-1 trivial o-field. And when S is not 0-1 trivial, the limit in 
the Birkhoff Ergodic Theorem need not be a constant. The following definition 
gives us terminology for describing whether or not a 0-1 law is in force for a 
given stationary sequence. 


Definition 2. A stationary sequence X with distribution Q is ergodic if the 
shift-invariant o-field S is 0-1 trivial under Q. In this case, we also say that the 
distribution Q is ergodic. 


If X is an ergodic sequence defined on some probability space (Q, F, P), then 
the sub-o-field X~1(S) of F is 0-1 trivial under P. Thus if E(Xo | X~1(S)) is 
defined with positive probability, then it is a.s.-defined and equals a constant a.s. 
In this case therefore, the limit in the Birkhoff Ergodic Theorem is a constant, 
just as in the Strong Law of Large Numbers. 

By the Kolmogorov 0-1 Law and Problem 7, any iid sequence is ergodic. We 
will also see in the exercises that a stationary Markov sequence is ergodic if its 
initial distribution is not a mixture of two or more equilibrium distributions, 
and an infinite exchangeable sequence is ergodic if and only if it is iid. Many 
stationary sequences are not ergodic. In general, the most that can be said for 
a stationary sequence X taking values in a Borel space is that the conditional 
distribution of X given X~1(S) is ergodic almost surely. 


Problem 17. Let Xo be uniformly distributed on [—1,1] and for n > 0, set X, = 
(—1)"Xo. Clearly the sequence (Xo, X1,...) is stationary. Show that it is not 
ergodic but that the limit in the Birkhoff Ergodic Theorem is a constant a.s. 


Problem 18. Let Q be an ergodic distribution on (©, H). Prove that the sequence 
(9,9°7,g0T*,...) is ergodic under Q. Restate the conclusion of this problem 
using the notation of Problem 5. 


Problem 19. Let Q be the distribution of a stationary sequence X whose terms 
take values in a Borel space. Show that X is ergodic if and only if the conditional 
distribution of X given X~'(S) is almost surely equal to Q. 


Problem 20. Let X be an infinite exchangeable sequence of random variables tak- 
ing values in some Borel space. Show that X is ergodic if and only if X is iid. 


562 28. STATIONARY SEQUENCES 


We wish to develop some useful criteria for ergodicity. For our first criterion, 
we need a definition. 


Definition 3. Let P be a convex set of distributions on some measurable 
space. A measure Q € P is extremal in P if Q cannot be written in the form 
Q =tR + (1 -— t)S for some t € (0,1) and distinct measures R, S € P. 


Another way of stating the previous definition is as follows: Q is extremal in 
P if the equation Q = tR + (1 — t)S implies Q = R = S whenever t € (0,1) and 
R,S EP. 


Theorem 4. Let M be the set of shift-invariant distributions on (O, H). A 
measure Q € M is ergodic if and only if it is extremal in M. 


PROOF. Suppose that Q is an ergodic member of M and Q = tR+(1-1t)S for 
some t € (0,1) and R,S € M. It follows easily from the definition of ergodicity 
that R and S are ergodic. Let B be any member of H. By Problem 5 and 
Problem 18, the sequence 


Y =([g,JIport,Ipor’,...) 
is an R-valued ergodic stationary sequence. By the Birkhoff Ergodic Theorem, 


i Yo + Yi +... Yn—1 
lim See ee ae ee 


n— o0 n 


= Q(B) Q-as. 


Since Q = tR+(1-—t)S and R and S are ergodic, the Birkhoff Ergodic Theorem 
also implies that the limit equals R(B) with Q-probability t and S(B) with Q- 
probability (1— t). Thus Q(B) = R(B) = S(B). Since B is an arbitrary member 
of H, it follows that Q = R = S, so Q is extremal. 

Now suppose Q is not ergodic. Let A be a member of S such that 0 < Q(A) < 
1. Let R(B) = Q(B N A)/Q(A) and S(B) = Q(B N A‘)/Q(A‘) for all B € H. 
Since A € S, it is easily checked that R and S are shift-invariant distributions. 
Since Q = tR + (1 — t)S with t = Q(A), Q is not extremal. O 


* Problem 21. Show that if R and S are distinct ergodic measures in M, then R 
and S are mutually singular. Hint: Look at the proof of Theorem 4. 


Problem 22. Let T be a Markov transition operator, and let M be the collection 
of equilibrium distributions for T. Show that a stationary Markov sequence with 
transition operator T is ergodic if and only if its initial distribution is extremal in 
N. Use this fact to show that a stationary Markov sequence with countable state 
space is ergodic if its transition operator is irreducible. 


28.5. ERGODICITY 563 


* Problem 23. Let X be a stationary rotation of the circle through angle 27a. (As 
indicated in Example 3, the initial distribution of a stationary rotation is uni- 
form, by definition.) Show that X is ergodic if and only if a is irrational. As a 
consequence, deduce the Weyl Equidistribution Theorem: If a is irrational, 


Se Doe) 
lim Ia o Xo+: +140 Xn- = A(A) a.S. , 


n— OO n 
where A is any Borel subset of [0, 27r) and A is Lebesgue measure. Hint: When a is 


irrational, show that there is only one shift-invariant distribution for the relevant 
shift transformation on the appropriate product space. 


Problem 24. Let € be the set of ergodic shift-invariant distributions on (©, H). 
Show that for any Q € M, there exists a random €-valued distribution R such 
that Q(B) = E(R(B)) for all B € H. Explain how this formula may be viewed as 
giving a decomposition of Q into a ‘convex combination’ of ergodic measures. 


Problem 25. Let X be a Markov sequence on Z* with transition matrix T = 
(T(i,7): i,j > 0). Suppose T is irreducible and positive recurrent, with equilibrium 
distribution Qo. For each state i, let f; be the indicator function of the set {i}. 
For each pair of states 7,7, calculate in terms of Qo and T the almost sure limit as 
n — œ of 


a ` fi( Xn) fi (Xk+). 
k=0 


Hint: First consider the case in which the distribution of Xo is Qo. 


We turn our attention now to a second criterion for ergodicity, one which 
establishes a relationship between the ergodicity of a stationary sequence X and 
the amount of dependence that exists between the individual random variables 
in the sequence. Before stating the criterion, we introduce some terminology. 


Definition 5. A (O, H)-valued stationary sequence X and its distribution Q 
are weakly mixing if 


n> Tt 


n—i 
lim > z Q(Anr*(B)) = Q(A) Q(B) foral A,BEH, 
k=0 
mizing if 
Jim Q(AN7~"(B)) = Q(A) Q(B) foral A,BEH, 


and strongly mixing if 


lim sup |Q(AN7~"(B)) — Q(A) Q(B)|=0 forall AGH. 


Roughly speaking, mixing is a kind of asymptotic independence. For example, 
with (O, H) = (W,G)°, a mixing stationary sequence (Xn: n =0,1,...) satisfies 


Jim P([Xo € A] N [Xn € B}) = P[Xo € A] P| Xo € B] 


564 28. STATIONARY SEQUENCES 


for A,B € G. From this same point of view, strongly mixing is uniform asymp- 
totic independence, and weakly mixing could be understood as asymptotic in- 
dependence ‘on the average’. Clearly, strongly mixing => mixing => weakly 
mixing. 


Theorem 6. A stationary sequence X is ergodic if and only if it is weakly 
MAILING. 


PROOF. Suppose X is weakly mixing. Choose A € S. For all k € Zt, 
Q(AN7~*(A)) = Q(A), since 7~*(A) = A. So the definition of weakly mixing 
implies that Q(A) = Q(A)?. Thus Q(A) = 0 or 1, and it follows that X is 
ergodic. 

Now suppose X is ergodic. Choose A,B € H and, for n = 0,1,2,..., let 
Yn = Ig ort” o X. The sequence Y = (Yn: n = 0,1,2,...) is stationary by 
Problem 5 and ergodic by Problem 18. We now calculate: 


n—1 


im = 57 Q(ANr-*(B)) = lim E((I4 o X) 


Yo +--+ Yni 
n 


) = Q(4) Q(B), 


the last equality following from the Birkhoff Ergodic Theorem and the Bounded 
Convergence Theorem. O 


Problem 26. Show that the stationary rotation of the circle through angle 2ra 
is not mixing for any a. What is the story with respect to weak mixing? Hint: 
Problem 23 may be useful. 


Problem 27. Let X and Y be independent stationary sequences of R-valued ran- 
dom variables. Show that X + Y is ergodic if and only if both X and Y are 
ergodic. 


* Problem 28. Show that X is strongly mixing if and only if the tail field 7 is 0-1 
trivial under the distribution of X. 


28.6. { The Kingman-Liggett Subadditive Ergodic Theorem 


The main result of this section is a useful generalization of the Birkhoff Er- 
godic Theorem. The hypotheses may seem strange at first glance, so we give an 
example to show how these hypotheses can arise in a natural way. 


Example 6. Let X = (Xn: n =0,1,2,...) be a stationary sequence of Rt- 
valued random variables. For n > 0, let Sn = Xo+---+Xn_1, and for0 <m <n, 
set 

Line Sen = 


and 
Fig hows 


28.6. THE KINGMAN-LIGGETT SUBADDITIVE ERGODIC THEOREM 565 


In the special case that X is an iid sequence, the sequence (Sn) is a random 
walk, and the sequence R = (Rn: n = 0,1,2,...) is the subject of Theorem 28 
in Chapter 12. 

It will be seen later in this section that the doubly indexed sequence R= 
(Rmn: m,n > 0) can be used to analyze the asymptotic behavior of the terms 
in the sequence R. We give here the properties of R that make such an analysis 
possible. 

First, we note the following obvious inequality: 


Ron < Rom + Rm n foralln>m>0O. 


This relationship is known as ‘subadditivity’. Next we note that by Problem 4 
and Problem 5, (Rok, Rk 2k, R2k.3k,..- ) is a stationary sequence for any positive 
integer k. Also, the stationarity of X implies that for any k > 0, the sequence 
(Ree, Rk,k+1, Rk,k+2,... ) has the same distribution as R. These three conditions 
are the first three conditions in the Kingman-Liggett Ergodic Theorem. The 
fourth and final condition of that theorem involves moment assumptions which 
are also satisfied by R. The conclusion is that Rn [n converges a.s. as n — o0. 
If X is ergodic, then the a.s.-limit is a constant. 


Theorem 7. [Kingman-Liggett Ergodic] Let (Zmn: 0< m <n < oo) bea 
doubly indered sequence of R-valued random variables satisfying the following 
four conditions: 


(i) Zon < Zom + Zm n fordO<m<n; 
(ii) for each k = 1,2,3,..., the sequence (Znk (n+1)k: n = 0,1,2,...) is 
stationary; 

(iti) the sequence (Zk nin: n = 1,2,3,...) has the same distribution for 
all k= 0,1, 23 

(iv) there exists a constant c > 0 such that E(|Zo n|) < cn forn = 
| es eee 


Then there exists an R-valued random variable L such that 


Zo.n 
(28.10) L= lim —* as. and in L; 
n—> CO n 
and 
(28.11) E(L) = inf E(Zon) 
n n 


Furthermore, if the stationary sequences in condition (ii) are all ergodic, then 
(28.12) L = E(L) a.s. 


Remark 1. If the random variables Zm,n are nonnegative, then (iv) in the 
Kingman-Liggett Subadditive Ergodic Theorem can be replaced by E(Zo1) < 
oo, since this inequality and (i) together imply (iv). 


566 28. STATIONARY SEQUENCES 


PROOF. Let 

E f A A 

PE y 

= Z 
L = lim sup 22 

noco 

Zo.n 
L = lim inf “2” . 


n>% n 


We break the proof into five steps. In the first step, |y| < 00 is proved. Then in 
the second, we obtain 


(28.13) E(L) <7. 


In the third step, we show that 


(28.14) lim sup Elon) < EIL): 
n= o0 n 

These three parts of the proof immediately give the existence of an R-valued 
random variable L satisfying (28.11) and the ‘almost sure’ aspect of (28.10) [by 
Property (v) in Theorem 9 of Chapter 4]. The fourth step shows how (28.12) is 
obtained as a by-product of the second and third steps. In the fifth and final step 
of the proof, we use uniform integrability to obtain Lı convergence at (28.10). 

Step 1. Clearly |E(Zon)| < E(|Zo,n|) < cn, from which it follows that 


nyse 
Step 2. In order to prove (28.13), we set 


n—-l 


Sy) = ` Zmk,(m+1)k 


m=0 


for positive integers k and n. By condition (ii) and the Birkhoff Ergodic Theorem, 


(k) 
EO EA 
noo nk 
exists almost surely, and 
E(Z 
(28.15) E(L®) = a 


From n + 1 applications of condition (i) we have 
Linen SPL aes 
for integers k,n > 1 andQ<I<k, so 


Zo.nk+l Znknk4+l 
+ 


; j <Il 
(28.16) laup F Ps + ede BEE 
for k > 1. By conditions (iii) and (iv), the random variables Zngneyi, n = 


1,2,..., are identically distributed and have finite mean, so Znk nk+1/ (nk +l) > 


28.6. THE KINGMAN-LIGGETT SUBADDITIVE ERGODIC THEOREM 567 


0 as. as n —> œ, by Problem 9 of Chapter 12. It follows from this fact and 
(28.16) that 


(28.17) L< LP a.s. 


for k = 1,2,..., from which it follows by the Fatou Lemma and (ii) that E(L) < 


ae 
Step 3. We next wish to prove that 
El Zin 
(28.18) E(L) > lim sup E(Zon) 


n> OO 


(This part of the proof is considered to be the hardest part.) For each k = 
1,2,..., let Up be a random variable that is uniformly distributed on {1,2,...,k} 
and independent of the collection (Zm,n). For n = 0,1,2,..., let 


ye = ZontU, — 0,n+Ur—1 , 
where Zo,9 is defined to be identically 0, and let 
VO SQ) p= 011 2s) 


be the corresponding sequence. Choose a subsequence (k;) of the positive integers 
so that 


(28.19) lim B(¥)") = lim sup F(Y”) 


t—400 k- 00 
(possible by the definition of the limit supremum), and so that 
yi) Py y asi oo 
for some random sequence 
y= 2 012. ess) 


of R-valued random variables (possible since the space R” is compact). 
We wish to show that Y is stationary. By the definition of convergence in 
distribution, 


(28.20) lim E(f o¥‘*)) = E(f oY) 


1400 


for any bounded continuous function f: RR By the definition of Y), 
a. 
(28.21) E(foY™®) = E(f or™! oy) = > Horo YD) 
l=0 


Both (28.20) and (28.21) also hold with f replaced by f or since 7 is continuous, 
so we have 


E(foY —foroY) = lim [Bf ort o¥) — fo¥)]. 
Tt CO i 


568 28. STATIONARY SEQUENCES 


Since f is bounded, the limit on the right of this last expression is 0, implying 
that E(foY) = E(foroY). Since f is an arbitrary bounded continuous function, 
it follows that Y and r o Y have the same distribution, so Y is stationary. 

Our next immediate goal is to prove the inequality (28.25) below. By the 
definition of y(") and condition (i), 


(28.22) Amu = (Z0,U; = Zo,Un—1)* < (Z=) 


By condition (iii), the right side of this equation has the same distribution as Z js 
and by condition (iv), E(Zj,) < co. Thus the family (x): RSN yA rt} 
is uniformly integrable. By the Uniform Integrability Criterion for convergence 
in distribution (Problem 27 of Chapter 14), 


E(¥o") = lim E[(¥p"?)*] < El(Zo,1)*] < 00. 
By the Fatou Lemma for convergence in distribution, 
E[(Yo)7] < lim inf E[(¥y*?)-). 
Combining these last two inequalities with (28.19) gives 
(28.23) lim sup E(Y{®) < E(Yo) < co. 


k= cœ 


By the definition of Y *, 


(28.24) E(Zo x) 


k 
De Zoi) = =, 


so by (28.23) and the definition of y, 


(28.25) E(Yo) > lim sup 


k-00 


E(Zo,k) 
a 


Thus, to complete this part of the proof, it is enough to prove that 
(28.26) E(L) > E(%). 

We have shown that Y is a stationary sequence and that y < E(Yo) < œœ, 
from which it follows that the random variables in the sequence Y are almost 
surely R-valued. By the Birkhoff Ergodic Theorem, 

Yot---+ Yn- 
Pin e 
TL OO n 
exists a.s. and E(L') = E(Yo), so (28.26) is equivalent to E(L) > E(L'). We 
will prove this last inequality by comparing the two sequences that produce L 


and L’. 
We need some terminology. A function f: R® — R is increasing if 


f(v0,71,---) < f(yo,y1,---) whenever £n < yn for all n =0,1,2,.... 


28.6. THE KINGMAN-LIGGETT SUBADDITIVE ERGODIC THEOREM 569 


If R® and R®) are two R” -valued random variables (not necessarily defined on 
the same probability space), R® stochastically dominates R?) if E(f oR“) > 
E(f o R®)) for every increasing bounded continuous function f: R® > R. 

For each k, denote by T*) the sequence of partial sums of the sequence Y“), 
Then by the definition of Y) and conditions (i) and (iii), 


E(f oT) 

= E|f ((Zo,u, — Zo,u,-1) » (Zo,u,41 — 20,u,-1) » (Zo,u, +2 — Zo,u,-1)3 ---) | 
SHG (ZIU ZO ZO AS ee) 

= E(f(Zo1, 20,2, 20,3, ---)) 

= E(f oT) 


for increasing bounded continuous f, so J") stochastically dominates T*) for 
E vena os pee 

Now let T denote the sequence of partial sums of the sequence Y. Convergence 
in distribution implies that 


E(f oT) = lim E(f oT) 


for all continuous bounded functions f: Re — R. Thus TO) stochastically 
dominates T. Since T®) = (Zon: n =0,1,2,...) and E(L’') = E(Yo), (28.26) is 
now a consequence of Problem 29, which follows this proof. 

Step 4. If each of the stationary sequences in condition (ii) is ergodic, then 
for each k, L) = E(Zo4/k) a.s.. By (28.17) and the definition of y, L < y 
a.s. In Step 3 we showed that E(L) > y. The desired conclusion follows from 
Property (v) in Theorem 9 of Chapter 4. 

Step 5. The second and third steps permit us to write 


Lh = Be 
Now let 
n—i 
Rn = So (Zits)? : 
l=0 


By the Birkhoff Ergodic Theorem, Rn/n converges a.s. as n + oo to a random 
variable R with finite mean, and E(|R — (R,/n)|) => 0. By the Uniform In- 
tegrability Criterion, the family {Rn/n: n = 1,2,...} is uniformly integrable. 
Repeated applications of (i) show that 


(Zon) < Ris 
so the family {(Zo,n)"/n: n = 1,2,...} is also uniformly integrable. Thus, 


lim p(x) = E(L*). 


noo n 


570 28. STATIONARY SEQUENCES 
It follows from this equation, (28.13), and (28.14) that 


) = E(\L)). 


The Uniform Integrability Criterion now implies Lı convergence at (28.10). O 


lim 2 (|= 


noo n 


Problem 29. Let R = (Rn: n = 1,2,...) and S = (Sn: n = 1,2,...) be sequences 
of R-valued random variables, and suppose that R stochastically dominates S. 
Show that for any bounded increasing measurable function f: R® — R, 


E(f o R) > E(f o 5), 


provided either the expectation on the left is > —oo or the expectation on the right 
is < oo. Note: In the proof of the Kingman-Liggett Subadditive Ergodic Theorem, 
this fact is used with f(x1,22,...) = lim infn=>æ (£n /Nn). 


Example 7. {First-passage percolation] For an arbitrary pair of points x,y € 
Z*, a path from z to y is any finite sequence (£o, ..., £n) of points in Z? such 
that zo = 2,2, = y, and |z; —z;-1| = lforgj =1,...,n. Let {7,.,: |z-—y| = 1} 
be an iid collection of nonnegative random variables, indexed by the ‘nearest 


neighbor’ pairs in Z?, and for any path m = (zo,...,Zn), let 
n 
U(r) = S Teni ; 
j=l 


For an arbitrary pair of points z,y € Z? (not necessarily nearest neighbors), 
define 
Mz y int U(r), 
where the infimum is taken over all paths from x to y. The random variable Mz y 
is called the first-passage time from x to y. We assume that E(Tz,y) < œ for 
each z and y that are ‘nearest neighbors’ from which it follows that E(Mz,y) < oo 
for all x and y. 
Fix a vector v € Z?, and let 


Zm, n = MU,NU 3 0O<c<m<n<o. 


In view of Remark 1 it is easily checked that the collection (Zm,n) satisfies 
conditions (i)-(iv) of Theorem 7. The reader is asked in Problem 30 to show 
that the ergodicity condition is also satisfied. Thus 


exists a.s. and in L;, where C’(v) is not random and is known as the time constant, 
although this constant depends on v. 


28.7. SPECTRAL ANALYSIS OF STATIONARY SEQUENCES 571 


* Problem 30. Prove that the collection (Zm,n) in Example 7 satisfies the ergodicity 
condition in the Kingman-Liggett Subadditive Ergodic Theorem. 


Problem 31. Let X = (Xn: n > 0) be a stationary sequence of random variables 
taking values in a Hilbert space with inner product (-, -). Show that 


; (Xo ++ Xn, Xo+ t Xn) 
lim | Fe ee ane ee 
noo (n + 1)? 


exists almost surely. 


Problem 32. [Wiener sausage] Let U,V,W be independent standard Wiener pro- 
cesses on [0, œc) (as defined in Chapter 19), and for t > 0, let 


Se= (J (Us, Va, W:)+ A), 


O<s<t 


where A is a Borel subset of R?. (Thus, S; is the random tube or ‘sausage’ swept 
out by the set A as it follows along with the randomly moving point (Us, Vs, Ws), 
0<s<t.) Let A denote Lebesgue measure in R?. Show that 


A 

a a ey 

t= oo t 

exists a.s., where C(A) is a constant depending on A, known as the ‘Newtonian 
capacity’ of A. Prove that C(A) > 0 if A has positive Lebesgue measure. (Note 
that C(A) can be positive even if A has Lebesgue measure 0. For instance, let A 


be the surface of a sphere.) 


Problem 33. [Products of random matrices] Let (An: n > 0) be an ergodic sta- 
tionary sequence of d x d matrices with positive entries, and for n > 0, let Bn be 
the product ApAi--- An. Show that for 1 < i,j <d, 


Bn b, ] . . 
im EP aS 
n= o0 n 


where C(i, j) is a constant. 


28.7. t Spectral analysis of stationary sequences 


Throughout this section, we will restrict our attention to sequences of R-valued 
random variables that have finite second moments. Such sequences are called 
second-order sequences. 

For a second-order sequence X = (Xo, X1,...), we define, as in Chapter 5, 
the mean vector mx and covariance matriz ix. Here mx has infinitely many 
coordinates and Sy has infinitely many rows and columns: 


mx(k) = E(X), k=0,1,2,... 


and 
X y(j, k) = Cov(X;, Xx), Iket iE 2e 


572 28. STATIONARY SEQUENCES 


A second-order sequence X is second-order stationary if it has the same mean 
vector and covariance matrix as the shifted sequence 7 o X, that is, if mx (k) 
does not depend on k and © x (j,k) = Ux (0,|7 — k|) for all j and k. 

The most important type of second-order sequences are Gaussian sequences, 
defined in Example 5 to be those random sequences whose finite-dimensional 
marginal distributions are normal. Since any normal distribution is uniquely 
determined by its mean vector and covariance matrix (see Theorem 20 of Chap- 
ter 13), it is clear that the distribution of a Gaussian sequence X is uniquely 
determined by mx and Xx. It follows from this fact that a Gaussian sequence 
is stationary if and only if it is second-order stationary. 


Problem 34. Prove that the mean vector and covariance matrix of any second- 
order stationary sequence are also the mean vector and covariance matrix of some 
stationary Gaussian sequence. 


In general, a second-order stationary sequence need not be stationary. For 
instance, the forthcoming Example 8 shows how to construct a certain class 
of second-order stationary sequences, and Problem 36 indicates precisely which 
sequences of that class are not stationary. It is remarkable that the main result 
of this section, which is a representation theorem, can be stated in a way that 
applies to general second-order stationary sequences. However, for simplicity we 
will only state and prove this result for the Gaussian case. 


Example 8. Let Zi, Z2 be uncorrelated R-valued random variables, each 
having mean 0 and finite variance o7, and let À be a real number. Then the 
random sequence X defined by 


Xp = Zı cos Àk + ZosinAk, k=0,1,2,... 


is easily seen by direct computation to be a second-order stationary sequence 
with mean vector mx = (0,0,...) and covariance matrix © x satisfying 


Dx (j,k) = 07 cos Xk — j), jk> 0. 


This sequence is a second-order stationary pure-tone sequence with frequency 
A/2n and random coefficients Z,, Z2. 

In applications in the ‘real world’, one usually considers the extension of such 
sequences to continuous time. Thus, a pure-tone sequence X becomes a pure- 
tone ‘random signal’: 


Xı = Zi cosAt+ ZesinAt, t>0. 


This continuous-time stochastic process is a sine wave. Its frequency is determin- 
istic, but its amplitude and phase angle are random. The extension to continuous 
time is not unique, since replacing A by À + 2p for some integer p changes the 
continuous-time extension without changing the original random sequence. 


28.7. SPECTRAL ANALYSIS OF STATIONARY SEQUENCES 573 


A main result of this section is that an arbitrary second-order stationary 
Gaussian sequence can be decomposed into a mixture of second-order stationary 
pure-tone sequences with different frequencies, so it follows that a second-order 
stationary sequence can be naturally extended to a continuous-time stochastic 
process by replacing each of its component pure-tone sequences by the corre- 
sponding pure-tone random signal. 


Problem 35. Let X be a second-order stationary pure-tone sequence. Show that 
if the pair of random coefficients (Z1, Z2) is normally distributed, then X can be 
written in the form 

Xk = aZcos(Ak+U), 
where Z is standard normal, U is uniformly distributed on [0,27), and (Z,U) is 
an independent pair. (Compare with Example 5.) 


Problem 36. Show that a second-order stationary pure-tone sequence X with fre- 
quency A/2z7 is stationary if and only if the distribution of the random vector 
(Z1, Z2) of random coefficients is invariant under a rotation of À radians about the 
origin in R°. 


Let X),...,¥(™ be second-order stationary pure-tone sequences defined on 
a common probability space. If the random coefficients of X are uncorrelated 
with the random coefficients of X*) for 1 < j,k < m, then it is easily checked 
that a, X) +---+4a,,X'™ is a second-order stationary sequence; it is a mixture 
of finitely many pure-tone sequences. 

In order to be able to represent all second-order stationary sequences as mix- 
tures of pure-tone sequences, we need to be able to make sense out of mixing 
uncountably many pure-tone sequences. While it is possible to do so for the gen- 
eral case, there are some technicalities involved that we wish to avoid. Matters 
are much simpler in the Gaussian case, as illustrated by the following example. 


Example 9. Let F: [—1,7] — R" be an increasing right-continuous function 
that satisfies F(—r) > 0. Extend F to (—oo,—7) by setting F(A) = 0 for 
A < —r. Also let V and W be independent standard Wiener processes on 
[0, 00). For n > 0, define 
(28.27) Xn = / 

( 


T Tv 

cosAnd(V o F)(A) + J sin An d(W o F)(A), 
—-r)— a 
where the integrals are understood to be Riemann-Stieltjes integrals, the exis- 
tence of which follows from Problem 14 of Appendix D. Consider Riemann- 
Stieltjes sums for the two integrals in (28.27), using the same point partition 
for both. It is easily checked that when these two sums are added together, 
one obtains a stationary Gaussian sequence that is a mixture of finitely many 
pure-tone sequences. It follows from a straightforward limiting argument that 
X is a stationary Gaussian sequence. 


574 28. STATIONARY SEQUENCES 


By using integration by parts (Proposition 2 of Appendix D), the integrals in 
(28.27) can be written as: 


(28.28) 
J cos àn d(V o F)(A) 
( 


=s 
= (V o F)(z)(cosan) + nf (V o F)(\)sinAndd, 


—T7T 


(28.29) 


1 sin An d(W o F)(à) = -n | (W o F)(\) cos àn dà. 

(—7)— -r 

By setting n = 0 in (28.28) and (28.29) and taking expected values we get 
E(Xo) = 0 and hence conclude that the mean vector is (0,0,...). We can use 
these same two formulas in conjunction with the Fubini Theorem to obtain 
Cov(Xo, Xn) = E[(VoF’)*(z)] (cos nn)+n | E|(VoF)(r)(VoF)(A)] sin Andy, 
where we have used the fact that V and W are independent in order to eliminate 
one term. [The Fubini Theorem cannot be used directly with (28.27) since 
the Riemann-Stieltjes integrals there are not measure-theoretic integrals or even 
differences of such integrals unless F is very special.] The first expectation on 
the right side is the variance of the value taken by a Wiener process at time 
F(a) so it equals F(z). The expectation inside the integral is the covariance 
of a Wiener process at time F(z) with the same Wiener process at time F(A). 
Therefore 


Cov(Xo0, Xn) = F(m) cosmn +n f [F (m) A F(A)] sin An dà 


T 
= J cos àn dF (A), 
(a= 
the last equality resulting from integration by parts. Thus, the covariance matrix 
Xx is given by 


(28.30) Xx(m,n) = a cos A(m —n) dF (A). 


= 


We call the function F in the preceding example the spectral distribution 
function of the random sequence X. It can be regarded as the distribution 
function of a finite measure p on [—7, r] called the spectral measure; that is, 


F(A) = pl-7,A], A € [-2, r]. 


We may view X as a mixture of uncountably many pure-tone sequences, with u 
determining the contribution of each individual frequency. The support of p is 
the spectrum of X. The points in the spectrum of X at which F is continuous 


28.7. SPECTRAL ANALYSIS OF STATIONARY SEQUENCES 575 


constitute the continuous spectrum of X, and if F is continuous, X has a pure 
continuous spectrum. The discontinuity points of F constitute the point spectrum 
of X. If the spectral measure of X assigns measure 0 to the complement of some 
countable set, then X has a pure point spectrum. 

We have seen how to use a finite measure p to construct a stationary Gaus- 
sian sequence X. The covariance matrix and hence the distribution of X are 
determined by p in this construction. It is easy to see, however, that more than 
one measure p can lead to the same covariance matrix. Consider, for exam- 
ple, a Gaussian pure-tone sequence with covariance matrix given by }(m,n) = 
cos(m — n). Any probability measures that is a convex combination of ô; and 
d_, can be used for a spectral measure. Thus, we have been somewhat imprecise 
in talking about ‘the’ spectral measure and spectral distribution function. 

There are several common conventions for eliminating this lack of uniqueness. 
Two of these will be useful to us. The first is to consider only measures pz that are 
symmetric about 0. This option is particularly useful in proofs in which complex 
exponentials are used to simplify calculations involving cos and sin, and it will 
also be useful in Example 10. The second is to consider only measures p that 
are supported by [0,7]. This will typically be our choice when trying to make 
the formulas look as simple as possible, such as in the next lemma and theorem. 


Lemma 8. [Herglotz] A matrix E with entries (m,n), m,n > 0, is the 
covariance matriz of a second-order stationary sequence X if and only if there 
exists a finite measure u on [0,7] such that 


(28.31) E(m,n) = i: cos A(m — n) p(dA) 
[0,7] 
for m,n > 0. Furthermore, the correspondence between © and u is one-to-one. 


We omit the proof of the preceding lemma because it is quite similar to the 
proof of Theorem 13 of Chapter 13. 

The following theorem is the main result of this section. It says that every 
stationary Gaussian sequence can be uniquely represented as a mixture with 
respect to its spectral measure of pure tone sequences. The result is an immediate 
consequence of the argument given in Example 9, the Herglotz Lemma, and the 
fact that the distribution of a Gaussian vector is determined by its mean vector 
and covariance matrix. 


Theorem 9. Let V,W be independent Wiener processes on [0,00) and let 
F: [0,7] > Rt be an increasing right continuous function that satisfies F(0) > 0, 
and extend F to (—oo,0) by setting F(A) = 0 for A< 0. For n > 0, define 


T 


(28.32) Kn = i cosAnd(V o F)(A) + I sin An d(W o F)(A), 


0— 


576 28. STATIONARY SEQUENCES 


where the two integrals are Riemann-Stieltjes integrals. Then X = (Xn: n > 0) 
is a stationary Gaussian sequence with mean vector (0,0,...) and covariance 
matriz Sx given by 


Ex(m,n) = | cosA(m —n)dF(A), m,n>0. 
0— 


Moreover, if Y = (Yn: n > 0) is any stationary Gaussian sequence with spectral 
distribution function F, then the sequence (Yn — EYo: n > 0) has the same 
distribution as X. 


Problem 37. Let X be a stationary Gaussian sequence with spectral measure p. 
Suppose that u can be written as the sum of pairwise mutually singular measures 
Hi,- Hk. Let X GQ) X™ be independent stationary Gaussian sequences such 
that for j = 1,...,k, X has spectral measure pj. Prove that X has the same 
distribution as X® +--+ x, 


Problem 38. Show that a stationary Gaussian sequence has the same distribution 
as the sum of two appropriate independent Gaussian sequences X") and X”, 
with X? having a pure continuous spectrum and X a pure point spectrum. 


Problem 39. Show that if a stationary Gaussian sequence X is ergodic then it has a 
pure continuous spectrum. Hint: Use the preceding two problems and Problem 27. 


Problem 40. Let X be a stationary Gaussian sequence with spectral measure p. 
Show that if u is absolutely continuous with respect to Lebesgue measure, then X is 
mixing and therefore ergodic. Hint: First consider the case in which the density of 
p is continuous, and use the continuity to prove that Cov(Xo, Xn) > 0 as n > œ. 


Problem 41. [Second-order ergodic theorem] Let X = (Xn: n > 0) be a (not 
necessarily Gaussian) second-order stationary sequence, and for n > 0, let Sn = 
Xo +--+ Xn-1. Show that there exists a random variable L with finite second 
moment such that S,/n converges to L in Lz as n — oo. In other words, show 


that 
2 
lim z((= -1) | BiG 
n 


n= oo 
For the Gaussian case, show that the convergence is also almost sure. Hint: First 
use the Birkhoff Ergodic Theorem for the Gaussian case. To generalize, show that 
the property of a sequence being Cauchy in Lz depends only on the mean vector 
and covariance matrix. 


It turns out that one can explicitly evaluate the limit in the preceding problem 
in the Gaussian case. Here is a brief sketch of the method. We use the formulas 


tAn 


n—i Eak n—-1 1— elan 
X cosak =R(— r) and > sin Ak = 3(-—— ; 
k=0 k=0 


28.7. SPECTRAL ANALYSIS OF STATIONARY SEQUENCES 577 


which are proved by summing the appropriate finite geometric series of complex 
exponentials. By Theorem 9, 


Sa 
P E(Xo) 


T _ eiAn 
(28.33) +f R(T) AV © F(A) 


T i= eiAn 
The integrals in (28.33) are Riemann-Stieltjes integrals, not measure-theoretic 
integrals. Accordingly, tools such as the Dominated Convergence Theorem are 
not likely to be useful. Nevertheless, it is reasonable to make a guess of a limiting 
random variable by taking limits inside of the integrals. The integrand in the 
first integral converges to the indicator function of the singleton {0} as n — ov, 
and the integrand in the second integral converges to 0. Thus, the guess is that 
the limit as n — oo of the entire expression is E(Xo) + (V o F)(0). In particular, 
the limit is constant if and only if F(0) = {0} = 0, in which case the limit is 
E(Xo). Once this guess has been made, it can be verified by calculating some 
means and second moments, as in the following problem. 


Problem 42. Let J, and Kn be the two integrals in (28.33). Show that E(K?) and 
E[(Jn — (V 0 F)(0))?] converge to 0 as n — œo. Then explain why these facts imply 
that S,/n > E(Xo) + (V o F)(0) as. as n > oo. Hint: Use Riemann-Stieltjes 
sums to calculate the means and second moments of Jn and Kn. Also, note that 
V oF(0) is independent of ‘most’ of the integral Jn, except for the part that involves 
integration near 0. 


Problem 43. Let X = (Xn: n > 0) be an ergodic stationary Gaussian sequence, 
with E (Xo) = 0. Fix an integer 7 > 0. Show that 


n—l 
casi 
es 
k=0 


exists a.s., and find a simple expression for this limit. 


The following random sequences (some of which have continuous-time ana- 
logues mentioned in Chapter 33) are stationary, Markov, and Gaussian. 


Example 10. [Ornstein-Uhlenbeck sequence] Let c € (0,00) and consider 
the symmetric spectral measure on [—7, r] with density 


CO 


c 
A D mc? + (A + 2rk)?] 


k=—00 


578 28. STATIONARY SEQUENCES 


Using (28.30), the Dominated Convergence Theorem, a change of variables, and 
the periodicity of cos, we obtain a formula for the covariance function: 


E(m,n) = ` | Am- aa 


k=—co 
oC m(1+2k) g oo p 
= cosy(m = n) dy = | cos y(m — n) —~——~. , 
2 P ET ae +97) 


which we recognize as the characteristic function of a Cauchy distribution. Thus 
it equals e~cl™-"l, Letting p = e~°, we see that for every p € (0,1) there 
is a stationary Gaussian sequence with correlation function given by (m,n) ~ 
pim—-"l. Tt is easily checked that if X is such a sequence, then the sequence 
(Xo, —X1, X2,...) is stationary Gaussian with correlation function (m,n) ~ 
(=p). 

To accommodate p = +1 and p = 0, we note that a spectral measure sup- 
ported by {0} gives the correlation function (m,n) ~ 1!"-"!, and a spectral 
measure equal to any multiple of Lebesgue measure on [—7, 7] gives the correla- 
tion function (m,n) ~~ 077l, 

The stationary sequences with correlation functions of the form (m,n) ~ 
pi™-"l_ p € [-1,1], are Ornstein-Uhlenbeck sequences. It develops that these are 
exactly the stationary Gaussian sequences that are also Markov. The argument 
in one direction is given below. Problem 45 addresses the other direction. 

Let X = (Xo, -X1,...) be a stationary Markov Gaussian sequence, and set 
p = Corr(Xo, X1). With no loss of generality, we assume that Var(X,,) = 1 and 
E(X,) = 0 for each n. In this situation we see from Problem 48 of Chapter 21 
that for any m and n, 


E(Xa | Xm) = Core Am: Aa) Am 


We will prove that E(X, | Xo) = p” Xo, from which it will then immediately 
follow by stationarity that the covariance function of X is (m,n) ~ pi™—", 
That E(X, | Xo) = pXo is true from the definition of p and the relation between 
correlations and conditional expectations. For an induction proof we suppose 
that E(Xn_1 | Xo) = p”! Xo. By stationarity, E(X, | X1) = p"~'X1. Since X 
is Markov, we then obtain 


E(Xn | (X1, Xo)) = oo XG $ 


Take conditional means of both sides to get E(Xn | Xo) = p”, as desired. 


Problem 44. For each p € [—1,0), find a spectral measure corresponding to the 


correlation function (m,n) ~ p'™~"!. 


* Problem 45. Prove that a stationary Gaussian sequence with correlation function 
(m,n) ~ p”! is Markov. Hint: One approach is to use Example 4 of Chapter 21. 


PART 6 


Stochastic Processes 


580 PART 6. STOCHASTIC PROCESSES 


Although the term ‘stochastic process’ has been given a variety of meanings in 
the literature, we use it in this book to refer to any collection of random variables 
Xı defined on a common probability space Q, taking values in a common ‘state 
space’ Y, and indexed by the continuous-time parameter t € [0,00). Thus, 
stochastic processes are continuous-time analogues of random sequences. Often 
the shorter term process is used in context. It should come as no surprise that 
we will now encounter continuous-time versions of many of the types of random 
sequences introduced in previous parts. Our emphasis will be on concepts and 
types of behavior that are either more natural in or unique to the continuous-time 
setting. 

We focus our attention on four important classes of stochastic processes: Lévy 
processes (Chapter 30), pure-jump Markov processes with bounded rates (Chap- 
ter 31), interacting particle systems with bounded, finite-range rates (Chap- 
ter 32), and 1-dimensional diffusions with ‘nice’ coefficients (Chapter 33). In 
each case, we provide a completely rigorous construction procedure. 

The main ingredient in the construction of the first three classes of processes 
is a type of random set known as a ‘Poisson point process’ (Chapter 29). Poisson 
point processes are not really stochastic processes in our sense of the term, since 
they do not involve the time parameter. However, they are important in their 
own right, and our use of them shows that they are closely related to stochastic 
processes, 

The construction of diffusions requires the ‘stochastic calculus’. In Chapter 33, 
this important concept is introduced, and then diffusions are seen to be solutions 
of ‘stochastic differential equations’ that involve Brownian motion in a manner 
that is analogous to the way ordinary differential equations involve the time 
variable. 

All of the stochastic processes mentioned above are ‘Markov processes’. They 
also are closely related to many continuous-time martingales. A brief introduc- 
tion to the general theory of Markov processes and their relationship to martin- 
gales is provided in Chapter 31. 


CHAPTER 29 
Point Processes 


Loosely speaking, a point process is a random ‘discrete’ set of points in some 
Polish space. Thus, one could use a point process to model experiments like 
throwing grains of sand onto the floor and noting their locations, or pointing 
an astronomical telescope in a random direction and noting the positions of the 
stars seen in the field of view. A mathematical example would be the random 
set of values taken by a finite sequence of random variables. This latter example 
makes it clear that we may want to generalize the notion of sets to allow a given 
point to appear more than once. It turns out that there is a nice mathematical 
way to accommodate the generalization using a certain class of Z -valued mea- 
sures. The relevant definitions and basic facts are given in the first section. The 
most important point processes are ‘Poisson point processes’, which are char- 
acterized by the property that their intersections with disjoint subsets of the 
underlying Polish space are independent. These are treated in Sections 3 and 4. 
An important tool for studying the distributions of point processes is introduced 
in the fourth section. This tool is needed in the final two sections of the chapter, 
where various operations on point processes are studied. In particular, the con- 
vergence in distribution of point processes is considered in the final section. One 
nice result from that section is that the Poisson point processes arise as limits 
of certain naturally defined sequences. 


29.1. Point processes as random Radon measures 


We begin by describing the appropriate setting for point processes. A Polish 
space W is locally compact if for every x € WV, there exists € > 0 such that the 
closed ball B(z,€) is compact. Equivalently, local compactness means that every 
point x € Y has an open neighborhood with compact closure. 

A measure p on a locally compact Polish space is a Radon measure if (C) is 
finite for every compact set C. Since we wish to define a point process to be a 
certain type of random Radon measure, we need to make the collection of Radon 
measures on a given locally compact Polish space into a measurable space. The 


582 29. POINT PROCESSES 


following lemma makes this task easy to do. 


Lemma 1. Let Y be a locally compact Polish space. Then there exist open 
balls B(y1,€1), B(y2,€2),... such that 


v= U B(Yn, En) 


n=1 
and for each n, the closed ball B(yn,€) is compact. 


PROOF. Let (y1,y2,...) be a dense sequence of points in VW. For n = 1,2,..., 
set 


(29.1) En = 1A ¿sup{e: B(yn,€) is compact}, 


which is positive since Y is locally compact. Moreover, B(yn,€n) is compact for 
every n. 

To show that YW C US, B(yn, En), we let z be an arbitrary member of Y and 
choose £ € (0,3] so that B(x,¢) is compact. Fix n so that the distance between 
x and yn is less than £. Then Blyn, 2E) is a closed subset of the compact set 
B(z,£), and is thus compact. By (29.1), €En > $ and thus z € B(yn,én), as 


23 
desired. O 


Let Y be a locally compact Polish space and M the collection of all Radon 
measures on W. We will place the same type of measurability structure on M 
that we did on the space of probability measures in Chapter 21. Denote by 9 
the smallest o-field of subsets of M such that for every Borel set B C Y, the 
function from M to R` defined by u ~ p(B) is §-measurable. 


Proposition 2. The space (M, $) defined above is a Borel space. 


PROOF. According to Lemma 1, the locally compact Polish space W is the 
union of countably many compact sets C,,C2,.... For each n > 1, let 


k=1 


and let Mp be the collection of all finite measures on the Borel subsets of An. 
Note that (A, A2,...) is a measurable partition of Y. Since each set A, is a 
subset of a compact set, the restriction to A, of any Radon measure on W is a 
finite measure on the Borel subsets of An. We leave it to the reader to use this 
fact to show that there is a Borel isomorphism between (M, 9) and the infinite 


product space 
oO 


&)(Mns Hn). 


n=l 
where for each n, 9, is defined in the same manner as §$). 
Thus, it is enough to show that each space (Mn, n) is a Borel space. Identify 
each nonzero finite measure u E€ M, with the pair (u(An),u/pu(An)). This 


29.1. POINT PROCESSES AS RANDOM RADON MEASURES 583 


identification provides an obvious Borel isomorphism between the nonzero finite 
measures on A, and the product of two Borel spaces, namely (0, oo) and the space 
of probability measures on An. The desired result follows immediately. O 


We will be particularly interested in Radon measures that take values in Z , 
so the following result is relevant. 


Corollary 3. Let © be a locally compact Polish space. The collection of all 
Z` -valued Radon measures on Y is a measurable subset of the Borel space of all 
Radon measures on WV, and thus is itself a Borel space. 


Problem 1. Prove the preceding corollary. 


Problem 2. Show that every Radon measure on a locally compact Polish space is 
o-finite. 


We are now prepared to define point processes. 


Definition 4. Let Y be a locally compact ae ie A point process on 
W is a random variable taking values in the space of Z -valued Radon measures 
on W. 


We often use the letter X to denote a point process, and X(B) to denote the 
measure assigned to a Borel set B by the random measure X. It is clear from the 
definition of the space (M,) that if A is a Borel subset of Y and X is a point 
process on W, then the set [X(A‘) = 0] is an event. If this event has probability 
1, then we sometimes say that X is a point process on A. 

It is not obvious from Definition 4 why the objects defined are named as they 
are. The ‘points’ in question have to do with the support of the random measure. 
We will see that the support of a Z -valued Radon measure u consists precisely 
of those ‘locations’ x such that u{x} is a positive integer. We take the point 
of view that p represents a collection of points that occupy the locations in its 
support. Any location x such that u{x} > 2 is ‘multiply occupied’. 

To be more precise, we introduce some more terminology. A closed subset A 
of a Polish space Y is discrete if for every x € A there is an open set B such that 
ANB = {z}. Equivalently, a closed set A is discrete if no point x € A is the limit 
of a convergent sequence of points in A \ {x}. The following result implies that 
a point process is a random Z' -valued Radon measure with discrete support. 


Proposition 5. Let C be the support of a Z' -valued Radon measure ona 
locally compact Polish space ¥. Then 


(29.2) C={xew: u{z} > 1} 


and C is discrete. 


584 29. POINT PROCESSES 


PROOF. Because of Lemma 1, there is no loss of generality in assuming that 
W is compact and yp is finite. Let x be a point in Y. By the Continuity of 
Measure Theorem, 


iz} = im alB ie) 


Since u is Z*-valued, there exists an € > 0 such that {z} = p(B(z,e)). If 
u(B(x,£)) = 0, then z is not in the support, since the support of a measure is a 
closed set. Otherwise, w{x} > 1 and z is clearly in the support of u. We have 
proved (29.2). 

To show that Č is discrete, let x be a member of C. As in the preceding 
paragraph, there exists an £ > 0 such that p{z} = u(B(z,£)). It follows from 
(29.2) that C N B(z,£) = {x}, as desired. O 


Let u be a Z -valued Radon measure on a locally compact Polish space W. 
Since each singleton {x} is a compact set, u assigns a finite integer value to 
each singleton. The multiplicity (with respect to u) of a point z is the integer 
p{x}. The preceding proposition implies that yz is uniquely determined by the 
multiplicities of the points in its support. In fact, we have the following formula: 


b= ` u{z}ôz. 
xrEC 
where Č is the support of u and 6, denotes the delta distribution at x. This 
formula is the reason that many of the examples of random and nonrandom p: 
valued Radon measures in this chapter are expressed as linear combinations of 
delta distributions. 

A set whose members have been assigned positive integer multiplicities is 
called a multiset. Thus, there is a natural one-to-one correspondence between 
Z -valued Radon measures on a locally compact Polish space Ų and discrete 
multisets of points in Y. The notation used for multisets is similar to that used 
for sets, except that elements with multiplicities greater than 1 are repeated 
according to their multiplicities. Thus {a,a,b} is the multiset in which the 
element a has multiplicity 2 and the element b has multiplicity 1. As with 
ordinary set notation, the ordering is not important, so {a,b,a} is another way 
of writing the same multiset. Expressed as a Radon measure, this multiset is 
26a + dp. 

Thus, a point process can be viewed as a ‘random discrete multiset’, and some- 
times it is convenient to express point processes in this manner. For example, in 
this introduction, we mentioned that one way to construct a point process is to 
use the values of a finite sequence (Y1,..., Yn) of random variables. Expressed 
as a random Radon measure, the resulting point process is 


dy, +-+- + dy, . 


As a random multiset, the point process is {Y1,..., Yn}. However, the set no- 
tation for a multiset should only be used when accompanied by words that are 


29.1. POINT PROCESSES AS RANDOM RADON MEASURES 585 


indicative of a multiset. Thus for instance, in the last section of Chapter 12 where 
the size of the image of a random walk is treated and no mention of multisets is 
made, repeated occurrences of a value are not to be counted. 

From the multiset point of view, X (B) is the number of members of X that lie 
in B, counting multiplicities, and for Yy a member of Y, X {yw} is the multiplicity 
of w, possibly 0. 


Problem 3. Prove that a discrete subset of a compact Polish space is finite, and 
that a discrete subset of a locally compact Polish space is countable. Thus, point 
processes have countable support a.s. 


Problem 4. Let Y;, 1 < j < n, be iid random variables taking values in a locally 
compact Polish space Y. Let X be the point process defined by X = 50, _, dy. 
For arbitrary Borel subsets A and B of Y calculate the distribution of the random 
number X(A) and the distribution of the random pair (X(A), X(B)) in terms of 
the common distribution of the random variables Y;. 


* Problem 5. Consider an urn with balls of m > 3 colors and denote by r; the 
number of balls of color 7, 1 < j < m. Fix an integer n < DE r; and consider 
the point process X that assigns to each color the number of balls of that color 
drawn when a total of n balls are drawn without replacement. Calculate the 
distributions of the random number X{1}, the random pair (X{1}, X{2}), and 
the random triple (X {1}, ¥{2}, X{3}). 


Problem 6. Let © = {1,2,...,n} and fix an integer r € [0,n]. Consider the 
point process X for which the random variables X{k} sum to r and constitute 
an exchangeable Bernoulli sequence of length n. Describe the distributions of the 
random n-tuple (X {1}, X{2},..., X{n}) and the random numbers X(B), BC Y. 


Problem 7. Modify the preceding problem by the deleting the parameter r and 
changing “exchangeable” to “iid with success probability 5”. 


* Problem 8. Let ¥ = {1,2,...,n} and r € Z*. Consider the point process X 
defined by 


= r: rae a 0 a ee a d 
P[X{k} =rk, 1 <k< n] = a k=1 "Re 
0 otherwise . 


Calculate the distribution of X(B) for each BC W. 


Problem 9. At times we extend the use of the term ‘point process’ to a random 
variable which with probability 1 equals a Z-valued Radon measure, but which 
may have some other value or values on a null set. This problem illustrates such 
an extended use. Let (Sn: n = 0,1,2,...) be a Z*-valued random walk with a 
step distribution whose support is the two-point set {0,1}. Describe the relevance 
of the opening sentence of this problem to the random Radon measure defined by 


R{y} = Hn: Sn = y}. 


586 29. POINT PROCESSES 


We conclude this section with a criterion for checking that two point processes 
have the same distribution. Since the o-field § on which the distribution Q of 
a point process is defined is somewhat complicated, we seek a relatively small 
subcollection € of § such that the values of Q on E determine the values of Q 
on §. The following result is a straightforward consequence of the Uniqueness 
of Measure Theorem in Chapter 7. 


Proposition 6. Let Q and R be probability measures on the measurable space 
of Z' -valued Radon measures on a locally compact Polish space Y. If 


Qiu: p(Bi) = 21, 1 < i < m} = Rip: (Bi) = zi, 1 <1 <m} 


for every finite collection {B1,..., Bm} of disjoint compact subsets of V and 
nonnegative integers z;, then Q = R. 


29.2. Intensity measures 


In this section, we introduce a concept for point processes that plays a role 
similar to that played by expectations for R-valued random variables. 


Definition 7. Let X be a point process on a locally compact Polish space 
W with Borel o-field A. The set function A ~ E(X(A)) from A to R” is the 
intensity measure of X. 


Note that if X is a point process on WY whose intensity measure assigns the 
value 0 to A for some Borel set A, then X is a point process on A. The following 
proposition shows that intensity measures are indeed measures. 


Proposition 8. The intensity measure of a point process is a measure (not 
necessarily Radon nor even o-finite). 


Problem 10. Prove the preceding proposition twice: by using the Monotone Con- 
vergence Theorem and again by using the Fubini Theorem. 


Problem 11. Let Xi, X2,...,Xn be iid random variables taking values in a locally 
compact Polish space Ų. Show that the intensity measure of the point process 
ôx, tédx, +:::+6x, equals nQ, where Q denotes the common distribution of the 
random variables Xx. 


Problem 12. Let 1, X2,... be iid random variables taking values in a locally 
compact Polish space W. Let N be a Z*t-valued random variable independent of 
the sequence (X1, X2,...). Express the intensity measure of the point process 


dx, +dx, +--++6x, in terms of the distributions of N and X1. 


* Problem 13. Explain how a renewal sequence (defined in Chapter 25) may be 
regarded as a point process. Express its intensity measure in terms of a quantity 
or quantities associated with the renewal sequence. 


29.3. POISSON POINT PROCESSES 587 


29.3. Poisson point processes 


In this section we will see how to construct point processes X having a given 
intensity measure and also having the property that X(A) and X(B) are inde- 
pendent random variables for disjoint A and B. Each random variable X (A) 
will have a standard Poisson distribution, with the understanding that the stan- 
dard Poisson distributions with mean 0 and oo are, respectively, d9 and doo. 
We begin with a problem showing that only Radon intensity measures should 
be considered, and then treat finite intensity measures before considering the 
general situation. 


Problem 14. Let X be a point process on a locally compact Polish space Y. Sup- 
pose for each Borel subset A C W, that X(A) has a standard Poisson distribution. 
Explain why the intensity measure of X is a Radon measure. 


Lemma 9. For every finite measure v on a locally compact Polish space WV, 
there exists a point process X with intensity measure v such that for each finite 
measurable partition {A,,..., Aa} of Y, the random variables X(A,),...,X (Aa) 
are independent and Poisson distributed. 


PROOF. Let v be a finite measure on Y. We may assume that v(W) > 0 since 
the case v(W) = 0 is trivial. Let (Y1, Y2,...) be an iid sequence having common 
distribution men and N a Poisson random variable that is independent of 
(Yi, Yo,...) and has mean vp(W). For n = 0,1,2,..., introduce point processes 


Xp = by, + + Ôv, 


with the understanding that Xo equals the zero measure. We want to prove that 
by using N to randomize the number of terms in the sum, we obtain a point 
process having the desired properties. 

Let {A1,..., Aa} be a measurable partition of Y, and for each n set 


CAO i Ra): 


Clearly, the distribution of Z, is multinomial with parameters ee, l<i<d. 
By Problem 18 of Chapter 10, the coordinates of Zy are independent and the 
it® coordinate of Zy is Poisson with mean “ay (WW) = vy(A;). Therefore the 
point process Xy has the desired properties. 0 


Theorem 10. For every Radon measure v on a locally compact Polish space 
Y, there exists a point process X with intensity measure v such that for each finite 
measurable partition {A,,..., Aq} of Y, the random variables X(Aj),...,X (Aa) 
are independent and Poisson distributed. Moreover, any two such point processes 
with the same intensity measure have the same distribution. 


588 29. POINT PROCESSES 


PROOF. Ifv(W) < œ the existence result is contained in the preceding lemma, 
so we assume that v(V) = oo. As in the proof of Proposition 2, choose a 
countably infinite measurable partition (1, W2,...) of Y such that 0 < v(U,) < 
oo for each k. Let vz, be the restriction of v to Wx. 

Construct independent point processes X% corresponding to vz as in the pre- 
ceding lemma, and set X = 5°°., Xx. To prove that X has the desired property 
we let {Aj,...,Aqg) be a measurable partition of Y. For 1 < i <d, 


(29.3) X(Ai) = X Xe (Ai)- 
k=1 


This sum of independent Poisson random variables is Poisson and its mean, be- 
ing the sum of the means, equals Jopa] ve(Ai) = Pga v(AiM Pk) = V(Aj). 
From the preceding lemma and the fact that the point processes X;, k > 1, are 
independent, it follows that the random variables X;,(A;), 1 < i < d, k > 1, 
are independent. Hence the random variables X(A;), 1 < i < d, are inde- 
pendent. The final assertion of the theorem is an immediate consequence of 
Proposition 6. O 


A point process having the properties given in Theorem 10 is called a Poisson 
point process with intensity measure v. The proofs of that theorem and Lemma 9 
give a method for constructing Poisson point processes with arbitrary Radon 
intensity measures. 

The most important Poisson point processes are those for which the intensity 
measure gives each one-point set measure 0. Proposition 12 below states that 
if X is such a Poisson point process, then X is a random set (rather than a 
random multiset) with probability 1. A key step in the proof of this result is the 
following fact about locally compact Polish spaces. 


Lemma 11. Let Y be a locally compact Polish space, and let u be a Radon 
measure on V. Then for all 6 > 0, there exists a countable measurable partition 
of Y such that each set in the partition has measure less than ô. 


PROOF. It follows from Proposition 2 that we may restrict our attention to 
the case in which V is compact. Since each point x in Y has measure 0, the proof 
of Proposition 5 shows that there exists an £z > 0 such that u(B(z,€z)) < 6. The 
collection of balls B(z,¢,) forms an open covering of the compact set VY, so there 
is a finite subcovering. Each member of the finite subcovering has measure less 
than 6. It is now an easy matter to construct the desired measurable partition 
of VW. O 


Proposition 12. Let X be a Poisson point process whose intensity measure 
assigns the value 0 to each one-point set. Then with probability 1, every member 
of the support of X has multiplicity 1. 


29.4. EXAMPLES OF POISSON POINT PROCESSES 589 


When X satisfies the conditions of the preceding proposition, it is natural to 
call X a random discrete set rather than a random discrete multiset. 


Problem 15. Prove Proposition 12. Hint: (i) Show that one only need consider the 
case where the underlying Polish space VY is compact and the intensity measure 
is finite. (ii) Use Lemma 11 to show that for each 6 > 0, Y has a measurable 
partition consisting of sets to which the intensity measure assigns value less than 
ô. (ii) Find an upper bound for the probability that the point process assigns 
value greater than 1 to at least one of these sets. 


Problem 16. Give an example of point process (not Poisson) for which the intensity 
measure assigns value 0 to each one-point set, but which with positive probability, 
is arandom multiset with at least one member having multiplicity greater than 1. 


Problem 17. [Uniqueness of the Poisson distribution] Let X be a point process 
whose intensity measure is a Radon measure that assigns the value 0 to each one- 
point set. Use the Central Limit Theorem for triangular arrays (given in Chap- 
ter 16) to prove that if the random variables X(A1),...,X(An) are independent 
for each finite sequence Ai,..., An of pairwise disjoint measurable sets, then X (A) 
has a compound Poisson distribution for each measurable set A. Also show that if 
X is a.s. a random set (rather than a random multiset), then X(A) has a Poisson 
distribution for each measurable A. 


29.4. Examples of Poisson point processes 


A variety of interesting calculations arise in connection with specific Poisson 
point processes. 


Example 1. Let X be the Poisson point process the intensity measure of 
which is counting measure on Z, set V = inf{n > 0: X{n} > 1}, and set 
U = sup{n < 0: X{n} > 1}. We will calculate the distribution of Z = V — U, 
the length of the longest open interval in R that contains the origin but no 
member of the support of X. 

Of course, the distribution of —U is the same as the distribution of V. Thus, 
if U and V were independent we would obtain the answer by finding the distri- 
bution of V and then convolving it with itself. But U and V are not independent 
since one of them equals 0 if and only if the other one equals 0. 

It is easy to see that —U and V are conditionally independent given o(Do), 
where Do = {w: X(w,{0}) > 1}, and that they have identical conditional dis- 
tributions. The conditional distribution of V given o(Do) is given by 


1 w € Do, v=0 

0 we Do, v = 1,2,... 
P(V =v | o(Do))(w) = ° 

0 w € Do, v=0 


(e— 1)e™” wg Do, v= 1,2,.... 


590 


29. POINT PROCESSES 


Taking the convolution of this conditional distribution with itself gives 


1 w € Do, z= 0 

0 wE Do, z=1,2,... 
P(Z =2z|o(Do))(w) = à 

0 w € Do, z = 0 


(e —1)}(z-1)e™? wg Do, z=1,2,.... 


For each fixed z we take the expected value of this random variable in order to 
obtain the (unconditional) probability that Z = z: 


ok 


P(Z = 2) 1 — e`! gan 
= z)= 
(e= =e T zl 


Problem 18. Show that the distribution of the random variable V in the preceding 
example is geometric and given by P[V = v] = (1 — e7} )e™” for v =0,1,2,.... 


Problem 19. Find the distribution of inf{n > 0: X{n} > 2} for X defined as in 
Example 1. 


Problem 20. Let c € (0,00). For the Poisson point process X on R? with intensity 
measure equal to c? times Lebesgue measure, calculate the distribution function of 
inf{|y|: y E€ X}, with X viewed as a random set. 


Problem 21. For the Poisson point process X on R with intensity measure equal 
to c times Lebesgue measure, calculate the distribution, expectation, and variance 
of V — U, where V = inf{y > 0: X{y} > 1} and U = sup{y < 0: X{y} > 1}. 


Problem 22. Let X denote the Poisson point process the intensity measure of 
which is Lebesgue measure on R. For each t € R, let Vi be the smallest member of 
the support of X that is no smaller than t and let U; be the largest member of the 
support of X that is no larger than t. Calculate the distributions of (Vi — t,t — U+), 
Vi — Uz, and (Vi — Us, t — Uz). Also, calculate the correlations Corr(V; — t, Vs — s) 
and Corr(V; — Ur, Vs — Us) and calculate the limits of these functions as s > t. 


Problem 23. Show that if Y is a Poisson point process on R™ \ {0} with intensity 
measure cÀ, where À is Lebesgue measure, then the distances between successive 
members of Y U {0} are iid and exponentially distributed with mean c™*. Equiva- 
lently, let W be the image of a random walk with exponentially distributed steps 
having mean c~', and show that W \ {0} is a Poisson point process on R \ {0}, 
with intensity measure cA. 


Problem 24. Let Z be a Poisson point process on (0, oc) x Y with intensity measure 
A x u, where Y is a locally compact Polish space, A is Lebesgue measure on (0, œœ), 
and p is a probability measure on VY. Show that with probability 1, Z can be 
written as Z = {(Un, Vn): n = 1,2,...}, in such a way that the sequence U = 
(Up = 0 < Ui < Up,...) is a random walk on Rt with exponentially distributed 
steps having mean 1, the sequence V = (Vi, V2,...) is independent of U, and V is 
iid with common distribution yp. 


29.4. EXAMPLES OF POISSON POINT PROCESSES 591 


Example 2. For the Poisson point process X on R? with intensity measure 
equal to c” times Lebesgue measure, let us calculate the expectation of the area 
U of the random region consisting of those points in R? which are closer to the 
origin than to any member of support of X. Denote the underlying probability 
space by (Q, F, P). 

We first note that the set of points closer to the origin than to some particular 
other point is the set of points on one side of the perpendicular bisector of the line 
connecting the origin to the other point, a set known as a ‘half-plane’. Thus the 
random region of points closer to the origin than to any member of the support 
of X is the intersection of half-planes, a set known to be convex. Since, with 
probability one, it contains the origin, its boundary can be represented in polar 
coordinates by a random radial function w ~~ [9 ~ R(6,w)] of the polar angle 8. 

A standard polar-coordinate formula gives 


E(U) = a (% R?(0,w) i) P(dw) , 


an iterated integral. It will develop, for fixed 6, that R?(6,-) is measurable — 
and it is clear, for fixed w, that R?(-,w) is continuous. From Proposition 9 of 
Chapter 9 it will then follow that (8,w) ~ R(@,w) is measurable and thus that 
the Fubini Theorem can be applied to give 


2r Qn 
E(U) = J) (| R? (0, w) P(de)) VEN: E(R?(0,-)) do. 
2 Jo Q 2 Jo 

In order to establish the preceding formula we need to show that R?(6,-) is 
measurable, which we will do simultaneously with calculating the distribution 
of R(@,-), and then we will use this distribution to calculate E(R?(0,-)). Before 
proceeding we note that a calculation valid for one value of @ is, with the obvious 
rotational changes, valid for all values of 6, so we need only consider 6 = 0. Then 
the above expression for E(U) reduces to 


(29.4) E(U) = rE(R?(0,-)). 


For any r > 0, the probability that R(0,-) > r is the probability that there 
is no point of the support of X that has the property that the perpendicular 
bisector of the segment between it and the origin intersects the 0 angle polar ray 
at a distance less than r from the origin. Let x, denote the Lebesgue measure 
of the set of points B, in R? having this property. Then 


(29.5) P[R(0) >r]=e7° *. 


Consider a point with polar coordinate description [s,y], s > 0 and |y| < 7. 
The midpoint of the line segment between it and the origin is [5,y]. In order 
that [s, y] € Br the point [§, p] must be the vertex of the right angle in the right 


triangle having angle |y| at the origin and hypotenuse of length less than r lying 


592 29. POINT PROCESSES 


along a portion of the positive horizontal axis. This condition is equivalent to 
the inequality s < 2rcosy, from which it follows that 


m /2 2r cos o a /2 
Kp = J | s ds dy = ar? f cos? y dp 
—7/2/0 0 


= 2r? [ro + cos 2y) dy = ar”. 
From Proposition 19 of Chapter 4 we see that 
E(R?(0)) = 2 a rP[R(0) >rjdr. 
So, from (29.4) and (29.5) we conclude that 


EG) = an | ree” dr = o? 
0 


Problem 25. Calculate the expectation of the perimeter of the random region 
treated in the preceding example. 


* Problem 26. For the Poisson point process X on R? with intensity measure equal to 
c? times Lebesgue measure, calculate the expectation of the volume of the random 
region in R° consisting of those points that are closer to the origin than to any 
member of the support of X. 


Problem 27. Generalize the preceding problem by replacing “3” by d. 


Problem 28. Let X be the Poisson point process on R with intensity measure equal 
to Lebesgue measure, and let S consist of those real numbers which lie midway 
between two consecutive members of the support of X. Let J be an interval of 
length l. Show that the probability that J contains no member of S equals 


ot eae g 
(29.6) ek z We T 


Show that the intensity measure of S is Lebesgue measure and then use (29.6) to 
show that it is not a Poisson point process. 


In Example 6 of Chapter 7 we introduced the measure space (L, A, v), where £ 
is the space of all lines in R?. To obtain the o-field A and the measure v we made 
use of a one-to-one correspondence between £ and {(s,y): s € R,O < y < Tr}. 
The members of A are the sets which correspond to Borel subsets of R x [0, 7) 
and the measure v is induced by Lebesgue measure on R x [0, 7). We may call a 
particular (s, y) the coordinates of the corresponding member of £, and in fact, 
we will speak of the line (s,y). Recall that |s| is the distance in R? from the 
origin to the line (s,y) and y or y+ 7 denotes the polar angle of the point on 


29.4. EXAMPLES OF POISSON POINT PROCESSES 593 


the line closest to the origin, with a suitable interpretation in case s = 0. The 
metric 


p((s1, 91), ($2, %2)) 


= V/[s2 — 81]? + [(y2 — 1) A (Y2 — p1 + T) A (p2 — Yı — 7) I? 


gives a locally compact Polish structure on £. We will omit the straightforward 
arguments that A is the Borel o-field and v is a Radon measure. Notice that 
v{l} = 0 for every | € L. By Proposition 12, a point process corresponding to 
v assigns, with probability 1, measure < 1 to every one-point set. Thus, it is 
natural to call such a point process a ‘random set of lines’. Figure 29.1 is relevant 
for some of the following problems. 


* 


FIGURE 29.1. Poisson point process ot lines 


Problem 29. Let X be a random set of lines in R? having a Poisson distribution 
with intensity measure v as described above. Let D(w) denote the region consisting 
of those points in R? having the following property: segments from the origin to 
them do not intersect any members of X (w). Explain why D is, with probability 
l, an open bounded polygonal region. Calculate the means of the perimeter and 
area of D. Hint: Use Problem 25 of Chapter 7. 


Problem 30. For the point process X described in the preceding problem let V (w) 
be that point on the positive horizontal axis that is closest to the origin among 
all such points lying on a member of X (w) and let #(w) be the polar coordinate 
angle of that (unique with probability one) member of X (w). Prove that V and 


® are independent, that V has a standard exponential distribution with mean Ł, 


594 29. POINT PROCESSES 


and that ® has density 


p ard [cos y| : 0 < p < T. 

Problem 31. For the region D described in Problem 29, let R denote the function 
that describes its boundary in polar coordinates, with distance from the origin 
being given as a function of angle. For each 0 calculate the distribution of the 
random pair (R(6), R'(8)). 


In many situations there is attached to a Polish space a group of measurable 
permutations of that space. For the space £ of Problem 29 the usual group G of 
permutations consists of the isometries of R?. Each isometry of R? takes a mem- 
ber of £ to a member of £L. The measure v of Problem 29 is a natural measure 
to be used in conjunction with G since it is G-invariant (or just invariant if G 
is understood), that is, v(A) = v(g~!(A)) for every g € G and every Borel set 
A. It can be shown that multiples of v are the only G-invariant Radon measures 
on L. 

The isometries of R? constitute a natural group H to be associated with the 
Polish space R?. The multiples of d-dimensional Lebesgue measure are the only 
H-invariant Radon measures on R?. 

In general, a finite point process X on a locally finite Polish space to which is 
attached a group G of measurable permutations is G-invariant if go X has the 
same distribution as X for every g € G. 


Proposition 13. The intensity measure of an invariant point process is in- 
variant. A Poisson point process with an invariant intensity measure is invari- 
ant. 


Problem 32. Prove the preceding proposition. 


Problem 33. Give an example of a noninvariant point process whose intensity mea- 
sure is an invariant Radon measure. 


In view of Proposition 13, the Poisson point process of lines illustrated by 
Figure 29.1 is invariant under the isometries of R?. 


29.5. | Probability generating functionals 


The probability generating functional of a point process X on a locally compact 
Polish space W is the functional 


h ~ e( TT aw) l 


defined for continuous [0, 1]-valued functions h for which the set {r: h(x) < 1} 
is relatively compact. The infinite product above has only finitely many factors 


29.5. PROBABILITY GENERATING FUNCTIONALS 595 


different from 1 and the factors that equal 1 can be ignored. In terms of the 
distribution Q of X, the probability generating functional is given by 


es | [] RU Qay) = J E ORO, 


YEY 


where the integration is taken over the space of Z -valued Radon measures on 
Y, log(1/0) = œ, 0- œ = 0, and e7” = 0. 


Example 3. The functional 


o0 j 
Tis ` o—(j+1) lI h(i) 
j=O0 1=1 
is the probability generating functional of the random subset of Zz \ {0} that 
equals @ with probability F, {1} with probability 4, {1,2} with probability t, 
and so forth. This random set may be regarded as the set of times of tail flips 

preceding the first head flip in an infinite sequence of fair coin flips. 


Problem 34. Calculate the probability generating functional for each of the cases 
r= 1 andr =n — 1l in Problem 6. 


Problem 35. Calculate the probability generating functional for the case r = 3, n = 
4 in Problem 8. 


Problem 36. Decide if the functional 


oS AG)hG + 1) 
e jij +1) 
j=l 
is the probability generating functional of some point process on Z* \ {0}. If so, 
describe such a point process. 


Problem 37. Follow the instructions of the preceding problem for the functional 


(09) 


a So POY 
MA T 


Jal 


Theorem 14. Point processes that have the same probability generating func- 
tional have the same distribution. 


PROOF. Let Q and R be as in Proposition 6, and suppose that their prob- 
ability generating functionals are the same. Let B;, 1 < i < m, be as in that 
proposition. Fix numbers s; € [0,1] and set 


m 
g= ila : 
i=1 


596 29. POINT PROCESSES 


Choose a decreasing sequence of functions hn in the domain of the probability 
generating functional of Q (and thus of R) such that, for each Y% € P, hy(w) > 
g(w) as n + oo. By the Dominated Convergence Theorem we obtain 


[Tht oan = [TL Rea. 
i=1 


i=1 


Now treat s1,...,8m as variables. For arbitrary z; € Zt, 1 < i < m, we 


equate the coefficients of ]]/”, s;' in these equal Taylor series to obtain 


Q{u: (Bi) =2,1<i<m}= R{u: (Bi) =2,1<i<m}. 
An appeal to Proposition 6 completes the proof. O 


Proposition 15. The probability generating functional of a Poisson point 
process on a locally compact Polish space having intensity measure v is given by 


ER eT fk) dy 


PARTIAL PROOF. Let X denote a Poisson point process having intensity mea- 
sure v on a locally compact Polish space Y. We first focus on [0, 1]-valued func- 
tions h (not necessarily continuous) with finite image {h1,..., Am}, such that 
{w: h(w) < 1} is relatively compact. For each i, let A; = {y: h(y) = hi}. Then 


e( TL awe) = (I oe) = Tear 


m 
Z [eme = go J0- dy 
i=1 
as desired. It is left to reader to treat arbitrary (0, 1]-valued continuous A for 
which {~: h(q) < 1} is relatively compact. O 


Problem 38. Finish the proof of the preceding proposition. 


* Problem 39. Calculate the probability generating functionals of Poisson point pro- 
cesses whose intensity measures are counting measures on countable sets. 


Problem 40. Construct a metric that makes (0,00) into a locally compact Pol- 
ish space. Then show that there is a Poisson point process X on (0,00) whose 
probability generating functional has the form 

kae k h' (t) logt dt 
for those h in its domain satisfying the additional condition that h’ exists and 
is continuous. Then calculate the distributions of inf{t > 1: X{t} > 1} and 
sup{t < 1: X{t} > 1}. 


29.6. OPERATIONS ON POINT PROCESSES 597 


We will not fully address the question of which functionals are probability 
generating functionals of some point process, nor will we address the general 
existence issue of a point process when probabilities of events of the form 


{u: W(Bi) = 2%, 1<i<m} 


are specified. However, the following proposition provides a necessary condition 
for a functional to be a probability generating functional. 


Proposition 16. Let § be a probability generating functional of some point 
process. Let hm, m = 1,2,..., and h be in the domain of §, and suppose that 
Am(w) > h(w) as m > co for every y, and U_,{wv: hm(W) < 1} is relative 
compact. Then (hm) > §(h) as m > oo. 


Problem 41. Prove Proposition 16, making sure that your proof shows where the 
relative compactness hypothesis is used. Show that a false statement is obtained 
if this hypothesis is removed. 


Problem 42. Proposition 16 might be stated concisely as: All probability gener- 
ating functionals are ‘continuous’. If you like topology, describe the neighborhood 
structure of the weakest topology on the domain of a probability generating func- 
tional consistent with this concise version of Proposition 16. 


29.6. ł Operations on point processes 


There is a natural addition for Z -valued Radon measures on a locally compact 
Polish space: (mı + u2)(B) = m(B) + pe(B). (The collection of Z -valued 
Radon measures forms a commutative semigroup under this operation with the 
zero measure being the identity.) In terms of multisets this operation corresponds 
to ‘union’ with an appropriate interpretation regarding multiplicities. 


Theorem 17. Let X and Y be independent point processes on a locally com- 
pact Polish space Y. The probability generating functional of X +Y is the product 
of the probability generating functionals of X and Y. 


Problem 43. Prove the preceding theorem. 


Problem 44. Let (Z: k =1,2,...) be an iid sequence of random variables taking 
values in a locally compact Polish space. Show that the probability generating 
functional of the point process dz, +-::+6z, is the functional h ~> [E(ho Z,)]”. 


Let Y and @ be locally compact Polish spaces and let g: ¥ — ® be a function 
satisfying: g7! (C) is compact if C is a compact subset of ®. (Such a function is 
known to be continuous.) The function g induces a function ĝ from the space of 
Z -valued Radon measures on W to the space of Z' -valued Radon measures on 
®: (9(u))(A) = w(g7*(A)) for Borel A C &. 


598 29. POINT PROCESSES 


Proposition 18. The function g defined above is a measurable function. 


Problem 45. Prove the preceding proposition. 


For locally compact Polish spaces V and ®, Proposition 18 implies that a func- 
tion g: Y > © for which g~!(C) is compact whenever C is compact transforms 
a point process X in Y into a point processes go X on ®. 


Problem 46. For X and go X as above, find a relation between their intensity 
measures and also between their probability generating functionals. Also, prove 
that ĝo X is Poisson if X is. Finally, give an example that shows that go X might 
be Poisson even if X is not. 


Problem 47. Let X be a point process having intensity measure equal to Lebesgue 
measure in R?. For x € R®, let g(x) = |x|. Calculate the intensity measure of the 
point process go X. 


29.7. į} Convergence in distribution for point processes 


A sequence (un: n = 1,2,...) of Radon measures on a locally compact Polish 
space W is said to converge to a Radon measure yp if ie fdun > fy f dp as 
n — oo for every continuous function f : Y — R for which the set {x: f(x) 4 0} 
is relatively compact. It can be shown that there exists a countable sequence 
(fm:m=1,2,...) of such functions having the property: un > p if and only if 
Ju fmdtn > fy fm dp for every m. It can then be shown that the function 


(wr) Yo Ae i u fn A 


(29.7) = 


m=1 


turns the space of Radon measures into a Polish space in which convergence is 
the same concept as that introduced in the first sentence of this section. This 
Polish space and corresponding Borel o-field is the same as (M, 9), introduced 
in Section 1. Moreover, the measurable subset of Z’ -valued Radon measures is 
a closed subset of the Polish space of all Radon measures, and thus is itself a 
Polish space. 

The comments above indicate that we may speak of almost sure convergence, 
convergence in probability, and convergence in distribution for sequences of point 
processes. 


Problem 48. Supply some of the proofs relevant to the preceding discussion. 


29.7. CONVERGENCE IN DISTRIBUTION FOR POINT PROCESSES 599 


Theorem 19. A family Q of distributions on the space of Z*-valued Radon 
measures on some locally compact Polish space Y is relatively sequentially com- 
pact if and only if for every compact C C WU, 


(29.8) Q{u: u(C)>z}>0 as z7 œ, 
uniformly in Q E€ Q. 


PARTIAL PROOF. We leave to the reader the proof that relative sequential 
compactness implies the uniform convergence in (29.8). For the opposite direc- 
tion we assume that (29.8) holds uniformly for each compact C. 

Let £ > 0. Choose fm, m = 1,2,..., as in the paragraph leading to (29.7), 
and denote the compact closure of {w: fm(w) 4 0} by Cm. For each m choose 
Zm so that Q{u: u(Cm) > zm} < €27™ for all Q € Q. Let 


Nw ee Cw) oa aS 1 eet 


Clearly, Q(A) > 1-—e for Q € Q. 

For any m and any sequence in A, there is a subsequence (un: n = 1,2,...) 
and an integer z < zm such that un(Cm) = z for all n and either z = 0 or the 
sequence (z~ un: n = 1,2,...) of probability measures on Cm converges. In 
either case the sequence (f fin dun: n = 1,2,...) is Cauchy. In view of (29.7) 
we can use the Cantor diagonalization procedure to show that any sequence in 
A has a convergent subsequence. Hence A is compact. So, Q is uniformly tight, 
and therefore it is relatively sequentially compact by the Prohorov Theorem. 


Problem 49. Complete the proof of Theorem 19 by showing that relative sequential 
compactness implies (29.8) uniformly. 


The next theorem is our motivation for having only continuous functions in 
the domain of probability generating functionals. 


Theorem 20. [Continuity (for Probability Generating Functionals)] A se- 
quence of point processes converges in distribution to a point processes if and 
only if the corresponding sequence of probability generating functionals converges 
pointwise (that is, function-wise) to a functional § that satisfies: if hm is in 
the domain of § for each m, U_i{W: hm(w) < 1} is relatively compact, and 
Am(w) + 1 as m > œ for each y, then &(hm) 3 1 as m— oo. In this case § 
is the probability generating functional of the limiting point process. 


* Problem 50. Prove Theorem 20. 


600 29. POINT PROCESSES 


Corollary 21. A sequence of Poisson point processes on a locally compact 
Polish space converges in distribution if and only if the corresponding sequence 
of intensity measures converges to a Radon measure v, in which case the limiting 
point process is Poisson with intensity measure v. 


Problem 51. Prove the preceding corollary. 


We conclude this chapter with a nice application of Theorem 20. The follow- 
ing theorem can be described informally as follows: given an intensity measure, 
‘randomly place’ n points in a set of ‘size’ n; for large n, the resulting point pro- 
cess is approximately Poisson with the given intensity measure. It is interesting 
to note that the independence that is inherent in the limiting Poisson point pro- 
cess X is absent from each of the point processes in the sequence that converges 
to X. 


Theorem 22. Letv be an infinite Radon measure on a locally compact Polish 
space V and suppose there exist open sets Ay C Ap C--- AW with v(A,) =n 
for each n. For each n =1,2,..., let (Yen: 1<k <n) be an tid sequence with 
common distribution v, on Y defined to equal 4 times the restriction of v to An. 
Define point processes Xn on V by 


Xn= > oy: 
k=1 


Then the sequence (Xn: n = 1,2,...) converges in distribution to the Poisson 
point process X having intensity measure v. 


PROOF. Let h be in the domain of the probability generating functional § of a 
Poisson point process having intensity measure v. Then {: h(q) < 1} C An for 
all sufficiently large n because {y: h(y) < 1} is relatively compact. For such n, 
we use Problem 44 to obtain formulas for the probability generating functional 
of Xn evaluated at h: 


(J navn)" = (1- [B-ra] = (1-2 j B-a)". 


Let n > co to obtain §(h). An appeal to Theorem 20 completes the proof. O 


Problem 52. For the case where v is Lebesgue measure on R?, describe appropriate 
choices for An in Theorem 22 and find the corresponding vn. 


CHAPTER 30 
Lévy Processes 


In this chapter we treat continuous-time analogues of random walks, while re- 
stricting ourselves to the state spaces R' and R. In the R -setting, these ‘Lévy 
processes’ can be constructed using Poisson point processes. Matters are slightly 
more complicated in the R-setting; in addition to Poisson point processes, Brown- 
ian motion and a limiting procedure are needed. The main results of this chapter 
show that there is a Lévy process for each infinitely divisible distribution. Af- 
ter these results are proved, we extend to the continuous-time setting some of 
the theory that was developed earlier for random walks. The chapter concludes 
with a section in which a few ‘sample function properties’ of Lévy processes are 
discussed. 


30.1. Measurable spaces of right-continuous functions 


The basic random variables in this chapter are function-valued. The domain of 
these functions is [0, 00). Often we will use ‘time’ to refer to this domain, as well 
as to an individual member of the domain. 

It turns out that the spaces of all R’- and R-valued functions on [0, co) are 
quite cumbersome. Therefore, we restrict our attention to certain nice subsets of 
these spaces. In the R’ -setting, we consider the space D*[0, co) of increasing, 
right-continuous functions from [0,0o) to R`. For the R-setting, we introduce 
a new term: a function f: [0,co) — R is cadlag if f is right-continuous, and 
for every t € (0,00), the limit f(t—) = lim, > f(t) exists. (The term ‘cadlag’ is 
an acronym for the French phrase ‘continues à droite, limites à gauche’ which 
means ‘continuous on the right, limits on the left’.) The space of all R-valued 
cadlag functions is denoted by D[0, co). (Note that the existence of left limits is 
automatic for the functions in D*[0, oo), since they are increasing.) 

We regard both of these spaces as measurable spaces, with corresponding 
o-fields generated by sets of the form {f: f(t) € B} for t € [0,00) and Borel 
BCR. The o-field will usually not be mentioned explicitly; rather it is implicit 
in a phrase such as ‘the measurable space D[0, 00)’. The notation f; can be used 


602 30. LEVY PROCESSES 


in place of f(t); in fact, when the function itself is random we will prefer the 
subscript position for the time variable. The following two problems treat issues 
of pointwise convergence for functions in D[0, œ) and DT (0, œ). 


Problem 1. Give an example to show that the pointwise limit of a sequence of 
functions in D[0, œ) need not be a member of D[0, co). 


Problem 2. Prove that if a sequence of functions in D[0, oo) converges uniformly 
on every bounded set, then the limit is a member of D[0, 00). 


Problem 3. Prove that C[0,0o) is a measurable subset of D[0,0o), and comment 
on the significance of this fact in conjunction with Problem 3 of Chapter 18. 


30.2. Definition of Lévy process 


Recall from Chapter 19 that a Wiener process on [0,00) is a C[0, 00)-valued 
random variable with stationary independent increments. The following defini- 
tion, inspired by Proposition 3 of Chapter 11 and the discussion preceding that 
proposition, says that a Lévy process is a D[0,0o)-valued or Dt (0, co)-valued 
random variable with stationary independent increments. Wiener processes are 
examples of Lévy processes. 


Definition 1. A Lévy process in R or R`, respectively, is a D[0,00)- or 
D* (0, 00)-valued random variable Y for which Yo = 0 a.s. and 


(30.1) Poe Ve Cy lean = | Pier eC 
j=l 


whenever C),C2,...,C, are Borel subsets of R and 
O= to < ti < t2 <tn. 


In the R’ -setting it is understood that the undefined quantity œ — œ is not 
a member of any Borel subset of R. A Lévy process in R is also called a 
subordinator. 


Just as with random walks in R, we have defined Lévy processes so that they 
always start at the value 0 at time t = 0. Sometimes the term ‘Lévy process’ (or 
the equivalent term ‘process with stationary independent increments’) is used 
more generally to include processes Y satisfying Y, = Y, + X, where Y is a Lévy 
process in the sense of Definition 1, and X is a R-valued random variable that 
is independent of Y. 


Problem 4. Let Y be a Lévy process in R and let s € [0,00). Prove that the 
process 
t~ Ys+t — Y; 


30.2. DEFINITION OF LEVY PROCESS 603 


is a Lévy process that has the same distribution as Y itself and is independent of 
a(Y¥,:r <s). 


Problem 5. Let Y be a Lévy process and fix a,b > 0. Show that the random 
sequence (Ya+bn — Yon: n = 0,1,2,...) is a stationary sequence (defined in Chap- 
ter 28). Calculate the mean vector and covariance matrix of this stationary se- 
quence for the case in which Y; has finite mean p and finite variance a”. 


Problem 6. Let X be a Poisson point process on (0, co) with intensity measure KÀ, 
where x is a positive real constant and À is Lebesgue measure. Define a D*(0, o0)- 
valued random variable Y by Y; = y-X(0,t] for t > 0, where y is a constant in R 
and it is to be understood in case y = œo that œ- 0 = 0. Show that Y is a Lévy 
process in R, and that for each t > 0, the distribution of Y; is that of the product 
of y and a Poisson random variable with mean kt. 


Problem 7. Show that the sum of two independent Lévy processes is a Lévy pro- 
cess, both in the R- and in the R` -setting. (Take care that your proof accommo- 
dates the process constructed in the preceding problem for the case y = oo.) 


The preceding exercise shows that Poisson point processes on (0,00) can be 
used to construct certain simple Lévy processes. The following example shows 
that by using Poisson point processes on (0,00) x R, a much larger class of 
Lévy processes can be constructed. This example is a special case of the main 
representation theorem for R-valued Lévy processes, to be proved in the next 
section. 


Example 1. [Compound Poisson processes] Let Q be an arbitrary distribu- 
tion on R, and let X be a Poisson point process on (0,00) x R with intensity 
measure K(X x Q), where x > 0 and A denotes Lebesgue measure on [0, 00). We 
define Y = (Y;: t € [0, 00)) by 


viz. Jaga? Xd) = faxot x dz) = Ñ` £X ((0,t] x {2}), 


zrER 


where (0,t] = @ in case t = 0, and the last summation is meaningful because 
with probability 1, all but finitely many terms in it equal 0. It is now easy to 
see that with probability one, [t ~ Y;] is cadlag, has its discontinuities limited 
to a discrete set of times, and is constant on intervals between discontinuities. 

That Y has independent increments follows immediately from the indepen- 
dence properties of Poisson point processes. That it has stationary increments 
follows from Proposition 13 of Chapter 29 and the invariance of k(\ x Q) un- 
der translations in the direction of the first coordinate axis. Therefore (30.1) is 
satisfied, and hence Y is a Lévy process. 

According to Proposition 12 of Chapter 29, X can be viewed as a random 
discrete set rather than a random Radon measure, because the intensity measure 


604 30. LEVY PROCESSES 


of X assigns the value 0 to every one-point set. From this perspective, the 
formula for Y may be rewritten as 


ha ` T: 
(s,2)€XN((0,t]xR) 
That is, to find Y;, sum the second coordinates of the points in X whose first 
coordinates are < t. Inversion of this formula gives an expression for the random 
set X in terms of Y: 
A S460) yg eyo. Soh. 

We conclude this example by determining the distribution of Y; for each t. In 
the definition of Y;, only the restriction of X to (0,#] x R is relevant. Denote 
this restriction by X; which is a Poisson point process having a finite intensity 
measure. From the proof of Lemma 9 of Chapter 29, it is clear that this random 
finite set is distributed like a random set of the form {(7,,Vi),...,(In, Vw) }, 
where N is Poisson with mean «t, and (Ti, Vi), (To, V2),..., is an iid sequence 
of (0, t] x R-valued random variables that is independent of N and has common 
distribution equal to the intensity measure of X; divided by «kt —that is A x Q. 
Therefore Y; has the same distribution as Vi +--+ Vyn, which we recognize from 
Example 2 of Chapter 16 as a compound Poisson random variable with Lévy 
measure KtQ. 


The preceding example is easily modified to accommodate the R’ -setting. In 
both settings, the resulting processes are called compound Poisson processes. For 
the case where Q is a delta distribution, the adjective ‘compound’ is omitted. 
The standard Poisson processes are obtained by setting Q = 61. 


Remark 1. If Q in Example 1 has the property that Q{0} € (0,1), then if 
one introduces & = KQ(R \ {0}) and Q: B ~ amon (B \ {0}), one obtains 


a pair (R, Q) that is different from (k,@Q) but which gives a compound Poisson 
process with the same distribution. Therefore, in order that there be a one-to- 
one correspondence between the set of pairs (x, Q) and the set of distributions 
of compound Poisson processes, it is sometimes required for the construction of 
Example 1 that Q be a measure on R \ {0}. Similarly, for the R -setting one 
might require that Q be a probability measure on (0, oo], not just on [0, oo}. 


Problem 8. [Alternate construction of compound Poisson processes] Let t ~ N; be 
a standard Poisson process. Independent of N let (Vi, V2,...) be an iid sequence 
of R-valued random variables with common distribution Q. Set 


Nt 
Zi= > Ve. 
k=1 


Show that t ~~ Z; is a compound Poisson process in R, and that the distribution 
of Z: is compound Poisson with Lévy measure E(N:)Q = tE(N1)Q. 


30.3. CONSTRUCTION OF LEVY PROCESSES 605 


Problem 9. Let Z be a compound Poisson process in R, and let Ti < To <... 
be the (random) times at which the random function Z has discontinuities. Show 
that the random variables Ti, Tə — Tı, T3 —T2,... are iid exponentially distributed 
random variables, and find their common mean in terms of the parameter «K of 
Example 1. 


* Problem 10. In the R’ -setting, obtain the moment generating function of Y;, con- 
structed as in Example 1, by applying the formula for the probability generating 
functional of the Poisson point process X (Proposition 15 in an optional section 
of Chapter 29) to appropriate functions in its domain. Use the same method to 
calculate the characteristic function of Y; in the R-setting. 


As the following proposition shows, the appearance of an infinitely divisible 
distribution in Example 1 was to have been expected. 


Proposition 2. Let Z be a Lévy process in either R or R. Then Zt is 
infinitely divisible for every t. 


Problem 11. Prove the preceding proposition. Hint: Care is needed for treating 
the R -setting. 


30.3. Construction of Lévy processes 


In view of Proposition 2, the question arises: For every infinitely divisible dis- 
tribution R does there exist a Lévy process whose distribution at time 1 equals 
R? We will build on Example 1 to give affirmative answers for both the R`- and 
R-settings. Since we will be using Lévy measures as intensity measures of Pois- 
son point processes, we need to make R™ \ {0} and R \ {0} into locally compact 
Polish spaces in such a way that all Lévy measures are Radon measures and the 
Borel sets are the usual measurable sets. 

For the R -setting, define the distance between w, x € R` \ {0} to be |ż — i] 
(where, of course, + = 0). Notice that any compact set in R* \ {0} is a subset 
of [€, oo] for some € > 0 and is thus assigned a finite value by each Lévy measure 
for R`. 

For the R-setting, we define the distance between w,z € R \ {0} to be 


1 1 
eee eee 


Since any compact subset of R \ {0} is a subset of (—oo, —e] U [e, 00) for some 
€ > 0, Lévy measures for R are Radon measures on R \ {0}. 


Problem 12. Why would it not work to define the distance between x and w in 
R \ {0} to equal |+ — +]? 


w 


606 30. LEVY PROCESSES 


Theorem 3. [It6 Representation (R` -version)] Let £ > 0 and X be a Poisson 
point process in (0,00) x (0, co] whose intensity measure is À x v, where À denotes 
Lebesgue measure and v is a Lévy measure for R*. Then Z defined by 


(30.2) Ze = Et+ faxo x dr) 


is a Lévy process in R`, and the distribution of Z; is the infinitely divisible distri- 
bution corresponding to (t£, tv) via the Lévy-Khinchin Representation Theorem 
for the R` -setting. 


PROOF. We first assume that v{co} = 0 = €. For € > 0, set 
v-(B) = v(ļe,œ) N B), Borel B C (0,20). 


An application of Example 1 to the intensity measure 
1 


a 


shows that 
z ey £ X((0,t] x dz) 
[e,00) 


is compound Poisson with Lévy measure tv. As € N 0, Z¢ Z Z; a.s., whether 
finite or infinite. Thus, we also have convergence in distribution, which in terms 
of moment generating functions can be written as 


m ep(= f 


(1 — ene) v(dz)) = Bete) 
£,00) 
Thus Z; is finite a.s. and is infinitely divisible with Lévy measure tv and shift 0. 

Since Zs — Z5 < Zą — Z; for s < t, the convergence Z* — Z is uniform on 
bounded intervals. By Problem 2, Z is cadlag a.s. The stationary increment 
property is clearly preserved under passage to the limit, so the theorem has now 
been proved under the assumption that € = 0 and v{co} = 0. 

In case v{o0} > 0, Z may be written as the sum of random functions Z and 


V, where 
z= | x X((0,t]x dr) and V = œ. X((0,t] x {oo}). 
[0,00) 


(As usual, we understand œ - 0 to equal 0.) By the part of the theorem already 
proved, Z isa Lévy process, and for each t > 0, Z: is infinitely divisible with 
shift equal to 0 and Lévy measure equal to the restriction of tv to [0,00). By 
Problem 6, V is also a Lévy process, and for t > 0, the distribution of V; 
is described in that problem. The independence properties of Poisson point 
processes imply that Z and V are independent. By Problem 7, their sum Z is 
a Lévy process. It is easily checked from the properties of Z and V that Zt has 
the desired distribution for each t > 0. It is also easy to see that the addition of 
a deterministic linear function t ~ Et accommodates arbitrary €. O 


30.3. CONSTRUCTION OF LEVY PROCESSES 607 


The Lévy measure v and shift € of the R' -valued random variable Zı in the 
preceding theorem are also called the Lévy measure and drift, respectively, of 
the Lévy process Z. 


* Problem 13. Let v be the Lévy measure of a Lévy process Z in R'. For each 
t € (0,00) and y € (0, oo] calculate the probability, in terms of v, that there exists 
s € (0,t] such that Zs — Zs- > y. Interpret the limit of your answer as y N 0. 


Problem 14. [Gamma processes] Let a and c be positive constants. For the Lévy 
process Z whose drift equals 0 and whose Lévy measure has density 


E 
y~ =a , y€ (0,co), 


with respect to Lebesgue measure, show that for t > 0, the Rt-valued random 
variable Z; has the gamma density 


t = = 
cea l e ay 


a T (ct) 
Problem 15. Let Z be the gamma process described in the preceding problem for 
c = 1, and define V by 


(Using a natural variation of notation previously introduced, we say that V is a 
D*[0, 1]-valued random variable.) For each t € [0,1], identify the distribution 
of V;. Find a formula for the function (s,t) ~» Corr(V;,V;) for (s,t) € (0,1), 
and discuss the limiting behavior of this function at the boundary (a square) of 
its domain. If you have become familiar with Dirichlet distributions (described 
in an optional section of Chapter 11), discuss why the term ‘Dirichlet process’ 
would be appropriate for the random function V had that term not already been 
appropriated as a synonym for ‘Ferguson distribution’, as noted in the section of 
Appendix G related to Chapter 27. 


Problem 16. (This problem requires familiarity with Dirichlet distributions, which 
are introduced in an optional section of Chapter 11.) Let Z be the gamma pro- 
cess defined in Problem 14 with c = 1, and let R denote the distribution of the 
corresponding random function V described in Problem 15. Show that 


B ~~ R({v € D*[0, 1]: yv € B}) 


is the conditional distribution given Zı = y of Z k the restriction of Z to [0,1]. 


0,1]? 


Problem 17. [Negative binomial processes] Let a and c be positive constants. For 
the Lévy process Z whose drift equals 0 and whose Lévy measure has density 


ce 74 


, y €Z*\ {0}, 


yu 


with respect to counting measure, calculate the distribution of each Z*-valued 
random variable Z;. 


608 30. LEVY PROCESSES 


Problem 18. Comment on similarities and differences between the gamma and 
negative binomial processes defined in Problem 14 and Problem 17. Give some 
attention to small times and also comment on large times. 


For the R-setting we have the following companion of Theorem 3. 


Theorem 4. |Itô Representation (R-version)] Let (X, W) be an independent 
pair, where W is a standard Wiener process and X is a Poisson point process in 
(0,0co) x (R \ {0}) whose intensity measure is A x v, where À denotes Lebesgue 
measure and v is a Lévy measure for R. Let 7 € R, let ø € [0,00), and let x 
be the function described by Figure 16.1 of Chapter 16. Then there exists a 
decreasing sequence (€k: k = 1,2,...) of positive numbers converging to 0 such 
that 


t~ nt+aw, 


+ im [/  X((0,t] x dz)) =+ | z v(dz)| 
k—- 00 (—00,—€% Ulex ,00) (—00,-€%]U[Ek ,o0) 


defines a Lévy process in R whose value at t has the infinitely divisible distribu- 
tion corresponding to (tn, to, tv) via the Lévy-Khinchin Representation Theorem. 


PARTIAL PROOF. In order to focus on the crux of the proof we will assume 
that 7 = o = v(—oo,0) = 0 < v(0,€) for every £ > 0, leaving it to the reader to 
remove these restrictions. 


Recall, from (16.6) of Chapter 16, that S r? v(dxz) < œ. For k € Zt, set 


0,1] 
Ex = sup{e: f y’ v(dy) < 8-*}. 
(0e) 


Clearly, €k N 0 as k > oo. 
Define Y) by 


(30.3) yi") = j, AAO x dr) — t | ANa 


Applying Example 1 to the first term on the right and using the fact that the 
second function is a nonrandom constant times t, we conclude that with prob- 
ability 1, Y‘*) is a Lévy process. Moreover, the characteristic function of yi*) 
equals 


uia exp(- = (1 — e? — ivx(z)) v(dz)) . 


The limit as k — 00 equals the infinitely divisible characteristic function that 
corresponds to (0,0, v) via its Lévy-Khinchin representation. 

In view of Problem 2 we can finish the proof by showing almost sure uniform 
convergence on bounded intervals by the sequence (Y*). We write 


k-1 
YO =O 4 Syn — yO, 


j=0 


30.3. CONSTRUCTION OF LEVY PROCESSES 609 
It is enough to show that for each t < ov, 
CO 
(30.4) ` sup] YYt) — Y| < cas. 
j=0 s<t 


By right continuity the term indexed by 7 equals 
lim sup bina yl) 


m>co gegm! $27 TE 
By the Etemadi Inequality (Lemma 21 of Chapter 12) and the Continuity of Mea- 
sure Theorem, the probability that this quantity is larger than 2~/ is bounded 
above by 
4 lim sup PIT j+1) mage | so I7 | 


MOO i<gm w2~™t 
(30.5) l 
sisp G+) —¥9| > 2-9-9]. 
s<t 
For j large enough so that ¢; < 1, we obtain from Problem 14 of Chapter 16 and 


our choice of (ex) that 
E(yit) — yO) ) = 0; 


Var (Yt) — yi) = | z? v(dz) < 8. 
[ej415€3) 
By the Chebyshev Inequality, (30.5) is bounded by 24~7, which when summed 
over j gives a convergent series. By the Borel Lemma, only finitely many of the 
events 
[supy 9+» — Y®|> arf] 
s<t 


occur, with probability 1. Therefore the series (30.4) is a.s. finite. O 


Problem 19. Complete the proof of Theorem 4 by removing the restrictions on n, 
o, and v imposed in the partial proof given above. 


From a Lévy processes in R constructed via its It6 representation, it is easy 
to recover the corresponding point process X, viewed as a random set: 


MS 4 (Se) Yo Ye Sa 


We have shown in this section that if Q is an infinitely divisible distribution 
on R or R, then there is a Lévy process Z such that Zı has distribution Q. 
In most cases, if the distribution Q has a name, then the same name is applied 
to Z. Thus, for example, the Lévy processes whose distributions at time 1 (and 
therefore at any positive time) are stable or strictly stable are said to be stable 
or strictly stable, respectively. And a standard Cauchy process is also known as 
a strictly stable process with index 1. Other examples of this terminology are 


610 30. LEVY PROCESSES 


found in Problem 14 and Problem 17. A major exception to this usage is the 
Wiener process, which is not called a ‘normal process’. 


30.4. Filtrations and stopping times 


Filtrations indexed by nonnegative integers were introduced in Chapter 11. In- 
dexing by nonnegative real numbers was treated in Chapter 19, but the random 
functions treated there were assumed to be continuous. 


Problem 20. Prove that Proposition 19 of Chapter 19 can be generalized to random 
variables that are D[0, oo)-valued, thus showing that Lévy processes are progres- 
sively measurable. 


In this section we will focus on Lévy processes in R, although some of what 
is done also applies to the R`. 

Suppose that (F;: t € [0,00)) is the minimal filtration for a Lévy process Y. 
By Problem 4, the D[0, 00)-valued random variable 


(30.6) peeves Vy, 


is a Lévy process that has the same distribution as Y itself and is independent 
of F,. For a filtration that is not minimal, the preceding statement may or may 
not be true. In case Y is adapted to (F+: t € [0,00)) and the process (30.6) is 
independent of F, for each s, we say that Y is a Lévy process with respect to 
(Fz: t € [0,00)). This terminology is also used in case the Lévy process has a 
more specific name. Thus we may speak of a Wiener process or strictly stable 
process with respect to a filtration. 


Proposition 5. A Lévy process in R with respect to a filtration (Fi: t € 
[0,œ0)) is also a Lévy process with respect to the filtration (F+: t € [0,00)). 


Proor. The proof of Proposition 24 of Chapter 19 applies: continuity is 
assumed there but only right-continuity is used; and minimality of filtration is 
assumed but, in the terminology introduced above, only the fact that the Wiener 
process there is a Wiener process with respect to the filtration (F;: t € [0,00)) 
is used. OD 


The following corollary is a special case of a more general result with the same 
name. 


Corollary 6. [Blumenthal 0-1 Law] Let (F+: t € [0,00)) denote the minimal 
right-continuous filtration of a Lévy process in Rt. Then every event in Fo4 has 
probability 1 or Q. 


PROOF. Denote the Lévy process by Y and let B € Fo;,. Because of the 
hypothesis of a minimal filtration, 


Beéeo(%:t>0)=o0((% — Yo): t > 0). 


30.5. SUBORDINATION 611 


By Proposition 5, with t = 0, events that belong to this o-field are independent 
of events belonging to Fo+. Since B belongs to both of these o-fields, P(BNB) = 
P(B)P(B) from which the desired conclusion follows. O 


In the terminology introduced in Chapter 12, the conclusion of Corollary 6 
can be restated as: The o-field Fo + is 0-1 trivial. 


Problem 21. Identify some interesting events that have probability 0 or 1 on the 
basis of the Blumenthal 0-1 Law. 


The following result says that Lévy processes start over at stopping times. 
Thus Lévy processes are ‘strong Markov processes’. 


Theorem 7. Let Y be a Lévy process in R with respect to a filtration (F;: t > 
0). Let T be an almost surely finite stopping time with respect to this filtration. 
Then the o-field Fr is independent of the D[0,co)-valued random variable t ~> 
¥rit—Yr,t > 0, which is a Lévy process with respect to the filtration (Frit: t € 
[0,00)) having the same distribution as Y itself. 


Problem 22. Prove the preceding theorem by adapting the proof of Theorem 25 
of Chapter 19. 


30.5. | Subordination 


Let Y be a Lévy process in R and Z a subordinator that is independent of Y 
and whose Lévy measure assigns the value 0 to the one-point set {oo}. Define 
f by Y, = Yz,. It is straightforward to use Theorem 7 to show that Y isa 
Lévy process in R. It is said to be subordinate to Y and it arises from Y by 
subordination using the subordinator Z. Let us use Proposition 5 of Chapter 23 
to calculate the characteristic function of Y. in terms of functions associated 
with Y and Z: 


(30.7) Ele) = E(E(e"¥zr | Z,)) = B(e72 9M) = eT) | 


where y denotes the characteristic exponent of Zı and 


y(v) = = log E(e~""). 


Example 2. Let us find the Lévy process that is subordinate to standard 
Brownian motion using a strictly stable subordinator of index 5. (See Problem 13 
in Chapter 15.) For some a > 0 (depending on which stable subordinator is 
used), the formula for the characteristic exponent in (30.7) becomes 

u? 11/2 
ard 


= Jul 
Sree es 
V2 


612 30. LEVY PROCESSES 


the characteristic exponent of a symmetric Cauchy process. Even though Brown- 
ian motion has no jumps and finite expectation at each fixed time, a Lévy pro- 
cess with jumps and undefined expectations at each time has been created from 
Brownian motion by ‘sampling’ it at a random set of times. 


Problem 23. Show that the strictly stable Lévy processes that are not subordinate 
to any Lévy process of a different strict stable type are exactly those of index greater 
than or equal to 1 whose Lévy measure assigns the value 0 to (0,00) or (—oo, 0). 
Hint: See Problem 13 of Chapter 15, where the moment generating functions of all 
strictly stable distributions in R* are identified. 


Problem 24. Suppose that a Lévy process Y is subordinate to a Lévy process Y 
using a subordinator Z. Let Q-, Q:, and R- denote the distributions of Y+, Yı, 
and Z,, respectively. Show that 


(30.8) Q(B) = Q:(B) R, (dt). 

[0,00) 
Apply this formula in the case that Y is a standard Wiener process and Z is a 
strictly stable subordinator of index 4 as described in Problem 13 and Problem 16, 
both of Chapter 15. Confirm that your answer is consistent with Example 2. 


* Problem 25. Apply (30.8) to find Q., in case Y is a standard Poisson process and 
Z is a standard gamma process, giving your answer in terms of the means of Yı 
and Z\. 


30.6. { Local-time processes and regenerative subsets of [0, co) 


We begin with an example in order to illustrate several related concepts. 


Example 3. Figure 30.1 illustrates, with short-dashed line segments, a stan- 
dard two-sided Poisson process Y corresponding to the triple (0, 0, + [54 + 6_1]) 
via the Lévy-Khinchin Representation Theorem. The function Lo, shown with 
the solid line segments, is called the ‘local-time process at 0’ of Y. Its value Lo(t) 
at any particular t equals the amount of time less than or equal to t that Y is 
at 0. Similarly for each y € Z, the ‘local-time process at y’ is denoted by L,(t) 
and equals the Lebesgue measure of {s < t: Y, = y}. For C C Z, the random 


function 
Lie DD L(t) 
yEC 
is the ‘occupation-time process’ of C; its value at t is the Lebesgue measure of 
{s < t: Y; € C}. In Figure 30.1, the function shown with dots is L_ of the 
particular w being illustrated, and the function shown with long dashes is the 
occupation-time process of {—1,0}. For each fixed t an ‘occupation-measure 


30.6. LOCAL-TIME PROCESSES AND REGENERATIVE SETS 613 


FIGURE 30.1. Local-time and occupation-time processes for the 
zero set of a two-sided Poisson process 


process’ is obtained by letting C vary. Its density with respect to counting 
measure is the random function y ~ L,(t). 
Let 
P(w) = {t € [0, œ): Yw) = 0}, 

and for each w denote its indicator function by t ~ X;(w). We call X a ‘re- 
newal process’. and E a ‘regenerative set’. Clearly X;(w) = 1 if and only if 
Lo(w,t + y) > Lo(w,t) for all positive y. It is straightforward to use the inde- 
pendent increment and stationary increment properties of Y to show that X has 
a property analogous to that in Proposition 1 of Chapter 25: 


P[Xt, = £n fr0<n<r+s] 


30.9 
ee = PX. <tn for 0< na r] PX Ha, torr eras, 


for all positive integers r and s, sequences (£1, .-.-,£r+s) Of 0’s and 1’s such that 
Tp = l, and tı < t2 < eee 

Unlike the situation in Chapter 25, there is no increasing random walk asso- 
ciated with Lo, %, and X. Instead there is a subordinator. The subordinator 
Z is the ‘right-continuous’ inverse function of Lo; its image is ©. It is shown 
in Figure 30.1 with the vertical axis as its domain, its graph being obtained by 
removing the horizontal line segments from the graph of Lo. To prove that Z is 
a subordinator one needs the fact that Y has the strong Markov property with 
respect to the minimal right-continuous filtration (Theorem 7 and Proposition 5) 


614 30. LEVY PROCESSES 


as well as stationary independent increments. The reason that the strong Markov 
property is needed is that a fixed time 7 (on the vertical axis in Figure 30.1) in 
the domain of Z corresponds to the random time 


inf {t: Lo(t) > T} ; 


which is a stopping time with respect to the minimal right-continuous filtration 
of Y. 
The random ø-finite measure N on [0, o0) given by 


(30.10) N(B) = / Xi A(dt) = (ENB), 
B 


where denotes Lebesgue measure, is the ‘renewal measure’ of the renewal pro- 
cess t ~ X;,. The ‘potential measure’ of t ~~ X, is the (nonrandom) measure U 
defined by 


(30.11) U(B) = E(N(B)) 


= J P|X: = 1] A(dt) (by the Fubini Theorem). 
B 


The following relations are clear and give the impression that we have introduced 
redundant concepts and notation: 


N[0, t] = Lo(t); 
U0, t] = E(Lo(t)) ; 


(30.12) U(B) = / PLY; = 0] A(dt) . 
B 


The density t ~» P[X; = 0] of U with respect to À is the ‘potential function’ of 
the renewal process t ~> X;. Problem 5 of Chapter 16 gives an explicit formula 
for the potential function: 


(30.13) tse Ty) 


where Jp is a modified Bessel function of the first kind. 
From Figure 30.1 we see that 


(30.14) U(B) = E(X({r: Z, € B})) = E(XA(ZN B)). 
By the Fubini Theorem, 
(30.15) U(B) = Ji R-(B)drT, 

0 


where R, denotes the distribution of Z-. Another application of the Fubini 
Theorem gives us a relation between the ‘Laplace-Stieltjes transform’ of U and 


30.6. LOCAL-TIME PROCESSES AND REGENERATIVE SETS 615 


the pair (€,v), where £ and v are the drift and Lévy measure of Z: 


| e7” U(dt) 

0,00) 

(30.16) = J  exp[=7 (gv + I „(= e™™) v(at)) | dr 
1 


v>QO0. 


(Ev + fz (1 — e7) v(dt)) 
We are able to calculate the Laplace-Stieltjes transform of U using (30.12), 
(30.13), and either the defining series (16.4) of To or a table of Laplace transforms: 
i 


e7” U (dt) = ————.,_ v>0. 


Hence 
(30.17) Ev + fa — e7 *™) v(dt) = Vv? + w. 
R 


To calculate € and v from (30.17) we first let v N 0 to obtain v{oo} = 0. 
Then we write the integral as a Riemann-Stieltjes integral, integrate by parts, 
and divide throughout by v to obtain 


O [ e “y(t, oc) dt = y1 + (2/v). 


Let v + œ to obtain € = 1, a conclusion that we could have also obtained from 
Figure 30.1. We thus obtain a formula for the (ordinary) Laplace transform of 
the function t ~ v(t, co): 


v~~ Y14+(2/v) -1. 


By using a power series expansion or a table of inverse Laplace transforms we 
obtain 
v(t, œ) =e *[Ip(t) + (t). 
The fact € = v(0, œ] implies that whenever Y is at 0, the amount of additional 
time it spends there before leaving has mean 1 (and is of course exponentially 
distributed). The function 


v(0,t] 
(30.18) t~~ TO = 


is the distribution function of each of the independent random variables (Tı — 
S1), (Tə — So),... that we now define using To = 0: 


Snlw) = inf{t > Tah- (w): Yi(w) #0}; 
Talo) = infit > Salo): Nej = 0}. 


be" [Gy eG) 


Thus Sn is the sum of 2n — 1 independent random variables, n of which are 
exponential with mean 1 and n—1 of which have the distribution function (30.18). 


616 30. LEVY PROCESSES 


Similarly Tn is the sum of 2n independent random variables, n of which are 
exponential with mean 1 and n of which have the distribution function (30.18). 


Problem 26. Summarize the various concepts and interconnections described in 
the preceding example. In doing so make sure you identify which objects are ran- 
dom objects and which are descriptive quantities connected with random objects. 
Also, say which conclusions have potential for generalization beyond two-sided 
Poisson processes and which conclusions are specific to these particular processes. 


Problem 27. Mimic Example 3 for Y a compound Poisson with Lévy measure 
pd, + (1 — p)d-1. Also, find the distribution of the Lebesgue measure of ©. Hint: 
You may find your solution of Problem 7 of Chapter 16 useful. 


Problem 28. Mimic Example 3 for Y a compound Poisson with Lévy measure 
[ói + 6-1 +r]. Also, find the distribution of the Lebesgue measure of ©. Check 
for consistency with the formulas in Example 3. 


The random set © of Example 3 satisfies (30.9) for tn and zn as described 
there and has the property that both it and its complement in Rt are unions of 
half-open intervals, each of which contains its own left endpoint. Any random set 
satisfying these conditions is called a regenerative set in R*™, and its (random) 
indicator function is a renewal process. [See Appendix G for a comment on these 
and the forthcoming definitions.] Equations (30.10) and (30.11) of Example 3 
are the definitions of the renewal measure N and potential measure U of a re- 
generative set X and corresponding renewal process X. The random function 
t~ L(t) df y [0, t] is the local-time process of &. As in Example 3, its ‘right- 
continuous inverse’ Z is a subordinator with drift 1 and finite Lévy measure. 
The renewal process X can be recovered from Z: X;(w) = 1 if and only if t is in 
the image of Z(w) and t < oo. 

The potential measure U is related, as at (30.15), to the distributions R, of 
the Rt-valued random variables Z,» : 


U(B) = [ RB yds. 


The density of U with respect to Lebesgue measure is the function t ~~ P[X; = 1], 

called the potential function of X. The calculation (30.16) is valid in general, so 
1 

30.19 e7" U(dt) = ——____—_..—_,,_ v> 0. 

l ) ha (dt) v + Jao — e™*t) v(dt) 

Returning to Example 3, we note that L, for y Æ 0 is not the local-time process 
of a regenerative set. It is the local-time process of a delayed regenerative set 
in Rt, the formal definition of which we omit since it is very similar to that 
of a delayed regenerative set in Zt. For any compound Poisson process Y, 
(t,y) ~ L,(t) can be defined as it was for the two-sided compound Poisson 


30.6. LOCAL-TIME PROCESSES AND REGENERATIVE SETS 617 


process in Example 3. For a Borel subset C of R, t ~~ J ec Ly(t) is the 
occupation-time process of C. 


Problem 29. Characterize those regenerative sets in R* that have the property 
that their complements are delayed regenerative sets. 


Theorem 8. Let (X+: t € [0,00)) be a renewal process, and denote by v the 
Lévy measure of the corresponding subordinator with drift 1. Then 
1 


30.20 lim P|X; = 1| = ——————__. . 
ee lee 1+ Joo,00) Y Y (dy) 


Also, X,; = 0 for all sufficiently large t (depending on w) if and only if v{co} > 0. 


PARTIAL PROOF. The last assertion of the theorem is obvious. There are two 
aspects of the first assertion: the existence of a limit and its value. 

For the existence we let Z denote the subordinator corresponding to X and 
use Problem 13 to obtain 


P|X: = 1] > P[X, = 1 for s € [0,t]] 


(30.21) ee. 


I 

ee 

N 
I 
I 


Now we fix b > 0 and consider the sequence (Xpn: n = 0,1,2,...). It is easy to 
see that it is a renewal sequence. By (30.21) it has period 1. So by the Renewal 
Theorem (Theorem 16 of Chapter 25), there exists a constant c € [0, 1] such that 


üm Pixel Sc 
Consider t between bn and b(n + 1). By (30.9) and (30.21), 
PIXg= 1) > Pixon m Pea = 
> PX = Le7 t-87)» (0,0] 
> P[Xon = 1]e7 0], 
and by a similar sequence of inequalities, 
Pian DEPA ra e 
Hence 


Ce VNR lim inf P[X: = 1] < lim sup P[X; = 1] < ce?” Ol, 
CO 


t co 


Now let b N 0 to obtain lim;_,., P[X; = 1] = c. 
To evaluate c we rely in part on the problem following this proof which gives 


OO 
(30.22) lim of e7” P[X, = 1]dt=c¢. 
VNO 0 


618 30. LEVY PROCESSES 


By (30.19), this is equivalent to 


v 
30.23 lim —— n EC. 


It is left for the reader to now show that c equals the right side of (30.20). O 


Problem 30. Prove (30.22), and then finish the preceding proof by showing that 
the left side of (30.23) equals the right side of (30.20). 


In view of Theorem 8 and the definitions in Chapter 25, we say that a renewal 
process is positive recurrent if f yv(dy) < œ, null recurrent if f yv(dy) = œ 
but v{co} = 0, and transient if v{o0} = œ. 


30.7. t Sample function properties of subordinators 


It is natural to ask for the probability that a given random function, such as 
a Lévy process, has certain properties like continuity or differentiability. For 
Lévy processes, there is a vast literature treating questions of this sort. In view 
of the prevalence of 0-1 laws, it is not surprising that for many properties, the 
probability is 0 or 1. For illustrative purposes we give a few such properties for 
subordinators in this section. 


Theorem 9. Let Z be a subordinator with drift €. Then 


dZ 


ha T 


* Problem 31. Prove the preceding theorem according to the following steps: 
(i) Show that without loss of generality one may assume € = 0; then use this 

assumption in the remaining steps. 

(ii) Show that without loss of generality one may assume that the Lévy measure 
v satisfies v(1, 00] = 0; then use this assumption in the remaining steps. 

(iii) Show that t7! Z; > 0 i.p. as t N 0. 

(iv) Show that E(Z+) < œ for all t. 

(v) Show that n ~ 2” Z-n» is a martingale. 

(vi) Complete the proof. 


* Problem 32. For any subordinator Z that never takes the value oo and any s € 
[0,co), show that el = £ a.s., where E€ denotes the drift of Z. One might 
carelessly conclude from this fact that Z+ = £t a.s. Identify the carelessness that is 
likely to be involved. 


30.7. SAMPLE FUNCTION PROPERTIES OF SUBORDINATORS 619 


Theorem 10. Let v denote the Lévy measure of a subordinator Z having 0 
drift, and let g be a function from R* to R™ satisfying: g(0) = g(0+) = 0 and 
x ~ g(x)/x is increasing on (0,00). Then 


Zt p a.s. if i v[g(t), co] dt < œ 
lim sup —~ = 


o g(t) 00 a.s. otherwise. 


Note that in Theorem 10, only the behavior of g near 0 is relevant. 


Problem 33. Prove Theorem 10 according to the following steps: 
(i) Let X denote the Poisson point process in the It6 representation of Z. 
(ii) Assume that i v[g(t), oo] dt = œ, let c € (0,00), and show that the inten- 
sity measure of X assigns infinite measure to {(s,r): x > cg(s),s < t} for 
every t > 0. 
(iii) Use the preceding step to deduce that 


Z = 
lim sup a ka > C a.s. 


-Z 
iNO g(t) 

(iv) Note that the preceding inequality remains true if Z:+- is removed from the 

numerator. Then let c — co to complete half of the proof. 
(v) Assume that I v|g(t), co] dt < oo, and let b € (0,00). 

(vi) Define ù by ù|z, co] = v[bg(x), œc] for z € (0,00), and prove that v is a Lévy 
measure for R`. 

(vii) Define a point process X by X((0, t] x [x, oo]) = X ([0, t] x [g(x), œ0]), and say 
why it is Poisson with intensity measure A x v, where A denotes Lebesgue 
measure. 

(viii) Define Z via the Ité Representation Theorem using the Poisson point process 
X and drift 0. Then show that Z(t) < g~!(Z,) for all t. 

(ix) Complete the proof by using Theorem 9 and then letting b N 0. 


Problem 34. For a gamma process Z find which numbers 8 € [1,00) have the 
property that Zp’ — 0 a.s. as £ N 0 and which have the property that 


Z(t) 


lim sup —— = œ a.s. 
wo te 


* Problem 35. Repeat the preceding problem with the gamma process replaced by 
a strictly stable subordinator of index a € (0, 1]. 


It turns out that the image of any subordinator with O drift has Lebesgue 
measure 0. In the other direction, the image is also uncountable unless the Lévy 
measure is finite. We want a method to distinguish the various images that have 
zero Lebesgue measure. 


Definition and Proposition 11. Let k: [0,co) —> [0,00) be a continuous 
strictly increasing function satisfying h(0) = 0. The set function h-meas defined 


620 30. LEVY PROCESSES 


on the Borel o-field of R by 


30.24 h-meas(B) = lim inf h(|Jkl), 
(30.24) NENG a a 2) 
BEUpai Je 


where each J, is an interval, possibly empty, and |J,| denotes the length of Jk, 
is a measure called Hausdorff h-measure. 


We omit the proof that h-meas is a measure. The usual Cantor set C can be 
covered by 2” intervals each of length 37”. Thus, if h(x) = 2!°8?/!°83 the sum 
in (30.24) equals 1; in fact, it can be proved that h-meas(C) = 1. 

More generally, let hg(z) = xê for some 8 > 0. It is easy to show that for 
any Borel set B in R, there exists a unique a € [0,00) such that 


œ if0<B<a 


hg-meas(B) = ‘3 sop ey 


Both Definition 16 and this fact can easily be generalized to R? with convex 
sets replacing intervals and diameter replacing length. The unique number a is 
called the Hausdorff dimension of B. By working with the function hg one easily 
shows that the Hausdorff dimension of every Borel subset of R? is less than or 
equal to d, and those that have positive d-dimensional Lebesgue measure have 
Hausdorff dimension d. The preceding paragraph shows that the usual Cantor 
set has dimension log 2/ log 3. 
There are several possibilities for Borel sets B in R of Hausdorff dimension 

a<l: 

e h,-meas(B) = 0, 

e 0 < ha-meas(B) < œ, 

e ha-meas(B) = œ and there exist Bı, B2,... such that B = USL, Bn 

and ha-meas(B,,) < œ for all n, 
e ha-meas(B) = oo and there do not exist Bı, Bo,... such that B = 
UL Bn and ha-meas(B,) < œ for all n. 

If B falls into one of the middle two of these four cases, we regard ha as ‘an 
appropriate’ function for ‘measuring’ B. In the other two cases it is natural to 
look for a function other than a power function to ‘measure’ B. The purpose 
of the preceding discussion is to set the background for the following theorem 
which we give without proof. 


Theorem 12. The image of a strictly stable subordinator Z of indez a has 
Hausdorff dimension a. Set h(0) = 0 and h(x) = x*log’~*(log(1/z)) for x € 
(0,1). Then h is an appropriate function for measuring the image of Z. 


CHAPTER 31 
Introduction to Markov Processes 


In many ways, continuous time is a more natural setting than discrete time for 
the study of random time evolutions. Fortunately, there is a close connection 
between the two settings, particularly in the case of Markovian time evolutions. 
In this chapter, we build on much of what was done for Markov sequences in 
Chapter 26 in order to develop the basic theory of Markov processes. Our main 
goal in doing so is to prepare the way for the final two chapters in which two of 
the most important classes of Markov processes are studied. 


31.1. Cadlag space 


In the modern study of Markov processes, it is common to view a Markov process 
as a random variable X, where for each w, X(w) is a function from the ‘time- 
line’ [0,00) to a ‘state space’ YW. Thus, we need to define a measurable space 
consisting of such functions. Since convergence issues will play an important 
role, we will restrict our attention to state spaces that are Polish. (The following 
definition generalizes the space D[0, 00), described in Chapter 30, to the Polish 
space setting.) 


Definition 1. Let (Y, p) be a Polish space. A function y: [0,co) > W is 
cadlag if it is right continuous and the limit 
i 
lim y(t) 
exists for all t € (0,00). The space of all cadlag functions y: [0,00) > W is 
denoted by D((0, co), ¥) and called cadlag space. 


For t € [0, œ) and y € D([0, œ), ¥), we commonly write y; for y(t). This 
notation is consistent with that used earlier in Chapters 19 and 30. We make 
D([0, œ), ¥) into a measurable space by introducing the o-field H generated by 
sets of the form 


{y: Pt E€ B}, 


622 31. INTRODUCTION TO MARKOV PROCESSES 


for Borel B C W and t € [0,00). Usually, this o-field will not be mentioned 
explicitly. A D([0, oo), Y )-valued random variable X is a random cadlag function, 
and X; is the state of X at time t. 

One reason for working with the space D([0, 00), ¥) is the following theorem. 


Theorem 2. Let (Y,p) be a Polish space and (D([0, 00), ¥), H) the corre- 
sponding measurable cadlag space. Then there exists a metric p on D([0, œ), Y) 
such that (D([0, 00), V), A) is a Polish space and H is the o-field of Borel sets in 
D((0, co), ¥). 


We omit the highly technical proof of this result. Its main importance is that it 
allows us to study the convergence of sequences of distributions on D([0, co), ¥). 
This fact is particularly important when constructing a relatively complicated 
stochastic process by taking the limit of a sequence of simpler stochastic pro- 
cesses. However, we will not need to use the theorem in this way. Our only 
application of it is the following. 


Corollary 3. Let X be a random cadlag function defined on a probability 
space (N), F, P). If G is a sub-o-field of F, then there exists a conditional dis- 
tribution of X given G, and this conditional distribution is unique in the sense 
given in Proposition 16 of Chapter 21. 


Problem 1. Show that the distribution of a random cadlag function X uniquely de- 
termines and is uniquely determined by its finite-dimensional distributions, which 
are the distributions of finite sequences of the form (X¢,,..., Xin), for positive 
integers n and times t1,...,¢n € [0, 00). 


* Problem 2. Let X be a random cadlag function defined on a probability space 
(0,7, P) and G a sub-o-field of F. Let Q be the conditional distribution of X 
given G, and for each t € [0, 00), let Q+ be the marginal of Q corresponding to t; 
that is, Q: is the random distribution on WV defined by Q:(-) = Q[X: € -]. Show 
that for each t € [0, œ), Q: is the conditional distribution of X: given G, and 


Qu > Qi as. as ut. 


31.2. Markov, strong Markov, and Feller processes 


In Chapter 26 we defined a Markov sequence in terms of a transition operator. 
In continuous time, a single transition operator is insufficient. 


Definition 4. Let © be a Polish space. A transition semigroup for V is a 
collection (Tą: t € [0,0o)) of transition operators for W satisfying the following 
properties: 

(i) Tof = f for all bounded measurable f: Y — R; 
(ii) TT; = Tast for all s,t € [0, 00); 


31.2. MARKOV, STRONG MARKOV, AND FELLER PROCESSES 623 


(iii) lima o Ti f(x) = f(x) for all bounded continuous f: ¥ — R and 
crew. 


Since transition operators are defined in terms of transition distributions, a 
transition semigroup is necessarily associated with a collection (Hz: x£ € V,t € 
[0,co)) of probability measures on Y, with 


(31.1) Tif (a) = J E, 


for all x € Y,t € [0, œ). These probability measures are the transition distribu- 
tions associated with the transition semigroup (Tų: t € [0,00)). 


* Problem 3. Find necessary and sufficient conditions on (pr,: £x € Ẹ,t € [0,00)) 
so that (Ti: t € [0, 00)) defined by (31.1) is a transition semigroup. 


We are now ready to define time-homogeneous Markov processes, distribu- 
tions, and families. 


Definition 5. Let U be a Polish space and (T;: t € [0,0o)) a transition semi- 
group for Y, with associated transition distributions Hz z, x € Y,t € [0,œ0). A 
D([0, œ), ¥)-valued random variable X adapted to a filtration (F+: t € [0,00)) 
is a time-homogeneous Markov process with respect to that filtration, having 
state space Y and transition semigroup (T;: t € [0,00)), if for all s,t € [0, œc) 
the conditional distribution of Xs+: given Fs is px, t- The distribution of such 
a process is a time-homogeneous Markov distribution with transition semigroup 
(Tı: t € [0, 00)). 


We will often omit the adjective ‘time-homogeneous’, since we will not con- 
sider other types of Markov processes or distributions in this book. If a stochastic 
process X is called a Markov process without any reference to a filtration, then 
the minimal filtration of X is implied. 


Definition 6. A Markov family of processes is a collection {X*: x € WU} 
of Markov processes with common transition semigroup (Tų: t € [0,00)), such 
that Xg = z a.s. for every z € WY. The corresponding collection (Q*: x € 
W) of Markov distributions is a Markov family of distributions with transition 
semigroup (Tų: t € [0,00)). 


Often one is able to construct a collection of distributions (Q7: x € W) on 
D([0, œ), ¥) that is a candidate for being a Markov family, but the construction 
is made in the absence of a transition semigroup. The following simple but useful 
result tells us when there exists a transition semigroup that makes the collection 
of probability measures into a Markov family. 


Proposition 7. Let © be a Polish space, and for t € [0, 00), set 
Hi =o (ys: s € [0,t], p € D([0, 00), W)). 


624 31. INTRODUCTION TO MARKOV PROCESSES 


Then a family (Q*: 2 € Y) of distributions on (D([0, œ), ¥), H) ts a Markov 
family if and only if the following three conditions are satisfied: 
(i) For all x € Y, QO*{p: po = £} = 1; 
(ii) For all t € [0, œ), x ~ Q*{y: pt E€ -} is a measurable function from 
W to the measurable space of probability measures on Y; 
(iii) For all x € U, s,t € [0,00), and Borel sets AC W, 


lps E€: | Hs](9) = Qip € | 


for Q*-a.e. 0 € D([0, co), Y). 


Under these conditions, the transition semigroup (T;: t € [0,00)) of the Markov 
family is defined by 


TOE / Hes) Q” (dp) 


Jor x € Y, t € [0,co), and bounded measurable f: Y > R. 


Problem 4. Prove the preceding proposition. 


We have described a one-to-one correspondence between a certain collection of 
transition semigroups and the collection of all Markov families (Q*: x € Y). We 
will not deal with the general question of which transition semigroups correspond 
to a Markov family, although we will give (without proof) a sufficient condition 
at the end of this section. Problem 6 below shows that an individual Markov 
distribution can correspond to more than one transition semigroup. 

The initial distribution of a Markov process X is the distribution of Xo. If 
the initial distribution of X is a delta distribution 6, for some x in the state 
space, then we call x the initial state of X. A Markov distribution is uniquely 
determined by the corresponding initial distribution and transition semigroup 
(see Problem 5 below). 

Let (Q*: x € WV) be a Markov family with transition semigroup (Tj: t € 
[0,co)) and Qo a probability distribution on ¥. Then 


(31.2) Cx J, Ooan), CEH, 


is the Markov distribution having initial distribution Qo and transition semi- 
group (T;: t € [0,00)). 

The function y ~ y from D([0,0o), ¥) to Y is measurable. This function 
and the Markov distribution (31.2) induce a distribution on YW —namely, the 
distribution of the state of the Markov process at time t when the initial distri- 
bution is Qo. If this induced distribution is also Qo for every t, we say that Qo is 
an equilibrium distribution for the Markov family, and also for the corresponding 
transition semigroup. 


31.2. MARKOV, STRONG MARKOV, AND FELLER PROCESSES 625 


Problem 5. Show that the distribution of a Markov process is uniquely determined 
by its initial distribution and transition semigroup. 


Problem 6. Show that the delta distribution at the identically zero function in 
D((0, 00), R) is a Markov distribution with (at least) two different transition semi- 
groups (T;: t € [0,00)) and (Tų: t € [0, 00)) given by 


Tif(x) = f(z) and Tif(r) = f(x+tsgnz), 


where sgn x is defined to equal 1, 0, or —1 according as x > 0, x = 0, orx < 0. 


The proof of the following proposition is requested in Problem 7. To prepare 
for its statement, we remark that a function f from a Polish space W (in fact, 
any topological space) to R is said to vanish at oo if for every £ > 0 there exists a 
compact subset C of Y such that |f (x)| < £ for x € ¥\C. Notice that continuous 
functions that vanish at oo are necessarily bounded and uniformly continuous. 


Proposition 8. Let Y be a Polish space, (T;: t € [0,00)) a transition semi- 
group for Y, X a D([0, œ), ¥)-valued random variable, and (F;: t € [0,00)) a 
filtration to which X is adapted. The following are equivalent: 


(i) X is Markov with respect to (F+: t € [0,00)); 
(it) For all times s,t € [0,00) and bounded measurable functions f: Y > 
R, 


E(f o Xs+t | Fs) = Tif (Xs) ; 


(itt) The equation in (ii) holds for all times s,t € [0,0o) and all contin- 
uous functions f: Y — R that vanish at œ; 

(iv) For allt > 0, the random sequence (Xo, Xt, X24,...) is Markov with 
respect to the filtration (Fnit: n = 0,1,2,...), with transition operator 
T;. 


Problem 7. Prove Proposition 8. Hint: For the equivalence of (i) and (iii), use 
Corollary 7 of Chapter 18. For the equivalence of (i) and (iv), use Problem 2. 


Problem 8. Show that the three conditions in Definition 4 are ‘necessary’, in the 
sense that if any one of them fails, a corresponding Markov family could not exist. 
Hint: For the second property, take expectations on both sides of the equation in 
(ii) of Proposition 8. 


Problem 9. Show that any Lévy process is a Markov process with state space R 
and initial state 0, and find the corresponding transition semigroup (or transi- 
tion distributions). What is the appropriate Markov family associated with this 
transition semigroup? 


626 31. INTRODUCTION TO MARKOV PROCESSES 


Problem 10. [Chapman-Kolmogorov equations] Let X be a Markov process with 
state space W, initial distribution Qo, and transition distributions yz. Show that 
for all positive integers n, bounded measurable functions f: Y” — R, and times 
coer ae 


E[f(Xe, >Xty tion: ++) At tetty )| 


a ia E) o. Hizo.: (dtr) )Qo(deo). 


Hint: One easy way to do this is to show that for any infinite sequence t1, to,.. 
of times, the random sequence (Xo, X¢,,-Xt,+1.,.-.) is a Markov sequence (not 
necessarily time-homogeneous), with transition functions Ri(z,-) = He,t, (C), and 
then use Theorem 3 or Problem 3, both of Chapter 22. 


Problem 11. Let X be a Markov process with respect to a filtration (F: t € 
[0,00)), and suppose that a Markov family (Q*: x € W) exists corresponding to 
the transition semigroup of X. Fix t € [0, œ), and define Y by Y; = X54. Show 
that the conditional distribution of Y given F; is Q**. 


If we replace the time s in Definition 5 by an a.s.-finite stopping time S, 
we obtain the definition of a strong Markov process. The reader may want to 
compare this definition with the ‘strong Markov property’ that was proved for 
Lévy processes in Chapter 30. 


Definition 9. Let X be a Markov process with respect to a filtration (F;: t € 
[0,œ0)), with transition distributions Hz,t, x E Y,t € [0,0o)). Then X is strong 
Markov with respect to the filtration (F+: t € [0, œ)) if for each a.s.-finite stop- 
ping time S with respect to that filtration and all t € [0,00), the conditional 
distribution of Xs+: given Fs is wx. The distribution of a strong Markov 
process is a strong Markov distribution. Markov families whose members are all 
strong Markov are strong Markov families. 


In the discrete-time setting, Markov and strong Markov are equivalent. Un- 
fortunately (see Problem 13), the same does not hold true in continuous time. 


Problem 12. State and prove strong Markov analogues of Proposition 7, of the 
equivalence among (i), (ii), and (iii) in Proposition 8, and of the result in Prob- 
lem 11. 


Problem 13. Let & be the following subset of R°: 
{(z,y): y =0 or z’ + (y - 1) =]}. 
Thus, Y is the union of the z-axis and the circle with radius 1 and center (0,1). 
Define y: R > Y by 
(x, 0) ifr <0 
p(x) = 4 (sinz,l—cosz) if0<2< 27 


(x — 27,0) otherwise . 


31.2. MARKOV, STRONG MARKOV, AND FELLER PROCESSES 627 


Let W be a Wiener process on [0, co), and define X by 
Xt = 9(Wi +7). 


Show that X is Markov, but not strong Markov. Hint: To show that X is Markov, 
use the fact that y is invertible except at (0,0). To show that X is not strong 
Markov, consider the hitting time of the point (0,0). 


We conclude this section with a discussion of an important special class of 
transition semigroups. This class has two nice properties: (i) every member of 
it corresponds to a Markov family and (ii) every such Markov family is strong 
Markov. 


Definition 10. A transition semigroup (7;: t € [0,00)) is a Feller semigroup 
if for all times t € [0,00) and continuous functions f: Y — R that vanish at oo, 
T;f is a continuous function that vanishes at oo. A Markov process is a Feller 
process if its transition semigroup is a Feller semigroup. 


Theorem 11. Let X be a Markov process with respect to a filtration (Fi: t € 
[0,0o)). If X is a Feller process, then X is strong Markov with respect to the 
filtration (Fip: t € [0,00)). 


PROOF. By the strong Markov version of the equivalence between (i) and (iii) 
in Proposition 8 (see Problem 12), it is enough to prove that 


(31.3) E(f o Xs+ | Fs) = Tif (Xs) 


for all a.s.-finite stopping times S with respect to the given filtration, all times 
t € [0,00), and all bounded continuous functions f: ¥ > R. Fix such S, t, and 


f: 
For ô > 0, let 


Ss(w) =inf{u > S(w): u = kd for some nonnegative integer k}. 


Then each Ss is almost surely finite, and Ss; —> S a.s. as 6 > 0. 
By Proposition 8, for each ô > 0, the random sequence 


(Anae Zae] 


is Markov with respect to the filtration (Fns: n = 0,1,2,...), with transition 
operator Ts. By Theorem 7 of Chapter 26 (the strong Markov property of Markov 
sequences), 

E(f S X Ss+t | Fs;) = Tif (Xss) : 


Since Fs C Fs,, it follows that for any B € Fs, 


E(f(Xs:+t MB) = E(T;f(Xs;,)IB) . 


Now let ô N 0 and use the right-continuity of X and the continuity of T} f to 
obtain 


E(f(Xs+: MHB) = E(T; f(Xs)Ip). 


628 31. INTRODUCTION TO MARKOV PROCESSES 


Since T; f (Xs) is clearly Fs measurable, (31.3) follows. O 


We omit the proof of the following important characterization theorem, but 
comment that in a proof one would only need to show ‘Markov’ since then ‘strong 
Markov’ would follow from Theorem 11. 


Theorem 12. Each Feller semigroup is the transition semigroup of a strong 
Markov family. 


31.3. Infinitesimal generators 


An important aspect of the theory of Markov processes is the development of 
methodology for understanding global behavior in terms of an operator that 
describes local behavior. In the following definition, we use the term converges 
boundedly to describe a pointwise convergent sequence of R-valued functions 
whose absolute values are all bounded above by a single finite constant. Similarly 
we speak of a limit existing boundedly. 


Definition 13. Let (T;: t € [0,00)) be a transition semigroup for a Polish 
space VW. The operator G defined by 


(31.4) Gf (x) = lim Tife) = fle) boundedly 

tN O t 
is the infinitesimal generator of the transition semigroup (T: t € [0,0co)) and 
of any Markov processes and families that might correspond to this transition 
semigroup; the domain of G consists of those bounded measurable functions f 
for which the limit at (31.4) exists (boundedly). 


Even though G is defined in terms of the transition semigroup (T;), it is 
possible in some situations that one will know G without knowing the transition 
semigroup. For instance, one might only know an approximation of Tų, but this 
approximation may be good enough for small t so that an exact formula for G 
can be obtained. The following theorem gives equations ‘of differential type’ that 
one might hope to solve for the function t ~ T, if G is already known. 


Theorem 14. Let (T;: t € [0,00)) be a transition semigroup and G the cor- 
responding infinitesimal generator for a Polish space Y. Then for all f in the 
domain of G, 


(31.5) T1Gf (2) = lim TFE) BAE L or, f(a) 


boundedly (with the understanding that h > 0 if t = 0). 


PARTIAL PROOF. In terms of the transition distributions pz +, 


(31.6) TG f(x) =f lim 240) — f(y) 


Ba eal) 


31.4. THE MARTINGALE PROBLEM 629 


Since f is in the domain of G, the bounded convergence theorem applies. Thus 
the first equality in (31.5) holds with A N 0 in lieu of h > 0. The existence 
of this limit implies, by definition, that 7;f is in the domain of G and that 
the second equality holds. The reader is asked in Problem 14 to replace the 
restriction h N 0 with A > 0. O 


Problem 14. Complete the preceding proof. 


* Problem 15. Describe the infinitesimal generator of a compound Poisson process 
in terms of its Lévy measure. 


Problem 16. Let G be the infinitesimal generator of a Markov family. Show that 
if Qo is an equilibrium distribution for the Markov family, then 


[ e102) Qoan) =0 
for all functions f in the domain of G. 


Problem 17. Let © be a Polish space and G the infinitesimal generator of a tran- 
sition semigroup (T7;: t € [0,0c)) for Y. Suppose that G is a bounded operator on 
the space of bounded measurable functions f: Y — R, meaning that there exists 
a finite constant c such that 

sup |Gf(x)| < c sup |f(z)| 

reEew crew 
for all such f. Show that the transition semigroup is uniquely determined by the 
formula 

e tGoG tGoG 
T: =e E 14+tG+ T 4 posers 


Hint: Under the given hypotheses, Theorem 14 implies that the function t ~ 
GT; f(x) has derivatives of all order. 


+..., t€[0,00). 


31.4. The martingale problem 
We adapt the main definition of Chapter 24 to the continuous-time setting. 


Definition 15. A D([0, œ), R)-valued random variable Z adapted to a filtra- 
tion (F+: t > 0) is a continuous-time martingale with respect to that filtration if 
E|Z;| < co for all t € [0, 00), and 


E(Zit+5 | Fa = Lig 


for all s,t € [0, 00). 


Problem 18. For X a Lévy process with E|.X;| < œ, show that t ~~ (X; —tEX)) 
is a continuous-time martingale with respect to the minimal filtration. 


630 31. INTRODUCTION TO MARKOV PROCESSES 


It is quite easy to adapt the main results for martingales in Chapter 24 to 
the continuous-time setting. Whenever we cite results from that chapter in a 
continuous-time context, we will assume that such an adaptation has been made. 


Definition 16. Let Y be a Polish space, § a collection of bounded contin- 
uous functions f: Y > R, and G a functional from § to the space of bounded 
measurable functions on VW. A D((0,00), W)-valued random variable X defined 
on Q is a solution to the martingale problem for (G, §) if 


t 
(31.7) im 1%) - f Gf(X,) du 


is a continuous-time martingale with respect to the minimal filtration of X for 


all f EF. 


The random variable defined by (31.7) is automatically D([0, 0c), ¥)-valued, 
due to the assumption that f is continuous. 

For martingales and solutions to the martingale problem, we speak of initial 
states just as for Markov processes. 

Denote the minimal filtration of X in Definition 16 by (F+: t € [0,00)). Since 
ip Gf(X,) du is measurable with respect to F., the statement that the expres- 
sion in (31.7) is a martingale is equivalent to the following condition: 


t+s 
Gf (Xu) du | Fe) = £(%s), 


s,t € [0, œ). 


(31.8) E(X) | Fs) E E( ; 


The following two theorems, which are close to being converses of each other, 
relate Markov families to solutions to the martingale problem. 


Theorem 17. Let G be the infinitesimal generator of a Markov process X 
with state space V, and let § denote a subset of the domain of G consisting only 
of continuous functions. Then X is a solution to the martingale problem for 


(G, 8). 


PROOF. Let X be any member of the Markov family. To prove that it solves 
the martingale problem we will calculate the left side of (31.8) and show that it 
equals f(X,). By the Conditional Bounded Convergence Theorem, the left side 
of (31.8) equals 


s+t 
Tif (Xs) ER I Tu-sGf (Xs) du , 


which by Theorem 14 equals 


t 
Tif (Xs) = i TAX) dv = Tof (Xe) = f(X) s. E 


31.4. THE MARTINGALE PROBLEM 631 


Theorem 18. Let G and § be as in Definition 16. Suppose that for each 
x € W, there is a unique cadlag solution to the martingale problem for (G, %) 
with initial state x. Then the collection of solutions obtained by varying x over 
W is a strong Markov family whose infinitesimal generator agrees with G on §. 


PROOF. Let X” be the unique solution to the martingale problem with initial 
; f 
state x. Now fix z, let T be a stopping time for X ox z. and set Y, = XT+t. 
Let f € 3 and consider the random function 


(31.9) m E J Gf (Ya) du, 


which is adapted to the filtration (Fri;: t € [0,00)). ForO<s <t, 
t 
E(#0%) - f GEY.) du | Fra.) 
0 


= E(f(Xr+) - J ed | Fris) +E | G | Fras) 


which by the Optional Sampling Theorem equals 
T+s T 
Xr) -| Gf Xu)dus | GAX) du 
0 0 


=s) - | ' GFYa) du. 


Thus the conditional distribution of Y given Fr solves the martingale problem 
with initial state Xr. Since the solution to the martingale problem is unique, 
this conditional distribution must be that of X*T, as desired. O 


The key to the preceding proof is the Optional Sampling Theorem. The 
availability of this theorem is one reason the martingale problem is so useful for 
treating Markov processes. There are two other significant reasons for using the 
martingale problem to study Markov processes. The first of these is obvious: it 
provides a large collection of martingales that can be used to analyze a Markov 
process. The other reason for studying the martingale problem has to do with 
sequences of Markov distributions. It is often hard to show directly that the 
limit of such a sequence is itself a Markov distribution. But it turns out to be 
quite easy in most cases to show that such a limit is a solution to a martingale 
problem. Then Theorem 18 can often be used to see that the limit is strong 
Markov. 


Problem 19. [Dynkin formula] Let X be a Markov process with infinitesimal gen- 
erator G, and let U < V be as.-finite stopping times for the minimal filtration of 
X. Show that for every continuous function f in the domain of G, 


V 


EIXv) ~ f(Xv)] = B( | 


U 


Gf (Xu) du) 


632 31. INTRODUCTION TO MARKOV PROCESSES 


31.5. Pure-jump Markov processes: bounded rates 


To illustrate the ideas introduced in preceding sections, we now construct some 
Markov families with infinitesimal generators that are bounded operators on the 
space of bounded measurable functions. 

Let Y be a Polish space, T a transition operator for Y with discrete generator 
G=T-I , zo a State in W, and c a positive constant. We will construct a 
Markov process X with infinitesimal generator G = cG and initial state to. Our 
ingredients are a Markov sequence X with initial state zo and transition operator 
T and a random walk S = (Sn: n =0,1,2,...) having exponentially distributed 
steps with parameter c > 0 (the reciprocal of the mean). Assume that X and 
the sequence S are independent of each other. For t € [0, 00), define 


Mi: = sup{n > 0: Sn < t}. 


(Those that have read Chapter 30 will recognize M as a standard Poisson process 
with Lévy measure cô.) 
The definition of X is quite simple: 


eo ere t € [0, œ). 


Roughly speaking, the discrete-time random sequence X has been converted to a 
continuous-time stochastic process X by using the Poisson process M to measure 
time. 

It can be shown directly that the process X just defined is strong Markov, and 
as such it is called a pure-jump Markov process with bounded rates. Rather than 
the direct approach of showing that X is strong Markov, we will use Poisson point 
processes to give an alternative construction of a D([0, oc), ¥)-valued random 
variable Y that has the same distribution as X. It will be easy to see from this 
construction that Y is strong Markov. An advantage of this second approach 
is that it is ‘universal’, in the sense that all pure-jump Markov processes with 
bounded rates are constructed on the same probability space. In Chapter 32, a 
similar construction is used for ‘interacting particle systems’, so our work here 
is a warm-up for that chapter. 

The basis for our alternative construction is a Poisson point process Z on 
(0, co) x [0, 1], with intensity measure A = A, x A2, where à; is Lebesgue measure 
on (0,00) and Az is Lebesgue measure on [0,1]. By Problem 24 of Chapter 29, 
we may write Z = {(Un, Vn): n = 1,2,...}, where the sequences U = (Un: n = 
1,2,...) and V = (Vn: n =1,2,...) are independent. That problem also implies 
that the sequence (0, U1, U2,...) may be chosen to be a random walk with steps 
that are exponentially distributed with mean 1, and that V is an iid sequence 
of random variables that are uniformly distributed on [0,1]. We will use the 
sequences U and V to construct a pair (Y, N) that has the same distribution as 
(X , M). Once we have accomplished that task, the construction is completed by 
defining 

Y, = Ýy,, t € [0, 00). 


31.5. PURE-JUMP MARKOV PROCESSES: BOUNDED RATES 633 


Since (Y, N) and (X, M) have the same distribution, the D([0, o0), ¥)-valued 
random variable Y = (Y;: t € [0,00)) has the same distribution as X. 
The definition of N is quite simple: 


N; = sup{n > 0: Un < ct}. 


It is easy to see that N has the same distribution as M. 

Before defining Y, we temporarily restrict ourselves to the special case in 
which ¥ = R. For this case, let z, x € R, be the transition distributions 
associated with the transition operator T , and let F}, x € R be the corresponding 
distribution functions. Define functions fz: [0,1] > R by 


fr(u) =inf{y E€ R: F; (y) > u}. 


Thus, f, is the left-continuous inverse of Fz, and by Proposition 4 of Chapter 3, 
if V is any random variable that is uniformly distributed on [0,1], then fs o V 
has distribution jz. 

We now define Y inductively. Let Yo = zo, and having defined Yn for some 
n > 0, let 

Paap Van. 
It is easy to check that since V is iid, Y is a Markov sequence. Since the random 
variables in the sequence V are uniformly distributed on [0,1], it is also easy 
to check from the definition of the functions fy that Y has transition operator 
T. Since U and V are independent, Y and N are also independent. Thus, our 
alternative construction is complete for the case Y = R. 

To generalize to the case of an arbitrary Polish state space Y, simply use the 
fact that since Polish spaces are Borel (see Proposition 20 of Chapter 21), there 
is an isomorphism g from Y to R. This isomorphism transforms the transition 
distributions associated with T to the transition distributions of a transition 
operator for R. Carry out the preceding construction for these transformed 
transition distributions, then apply the function g~! to the result to obtain the 
appropriate process with state space W. 

Let us summarize what we have done so far. For each initial state x € Y, we 
have defined D([0, 00), W)-valued random variables X* and Y” with the same 
distribution. The random variable X” was constructed directly in terms of a 
Markov sequence X and a standard Poisson process M. The random variable 
Y* was constructed by first defining a Poisson point process Z, and then using 
Z to construct a pair (Y,N ) having the same distribution as (X, M). This 
construction is measurable in Z and z, in the sense that there is a measurable 
function h: {discrete subsets of Y} x Y such that Y? = h(Z, x). (See Chapter 29 
for the interpretation of Z as a random discrete subset of Y.) The notation 
introduced in Problem 17 is relevant for the following result. 


Theorem 19. Let (X*: a2 € W) and (Y*: x € P) be the two families of 
processes defined above in terms of a transition operator T with discrete generator 


634 31. INTRODUCTION TO MARKOV PROCESSES 


G. Then for each x E€ P, X* and Y* have the same distribution and are strong 
Markov. The corresponding Markov family of distributions has generator G = 
cG, with transition semigroup (T;: t € [0,00)) determined from G via 


T; = etC = ectG = eee. te [0, 00) . 


PROOF. We have already shown that X” and Y” have the same distribution 
for each z € VW. We denote this distribution by Q”. It is easy to check from the 
construction that the collection (Q*: x € W) satisfies the first two conditions of 
Proposition 7. 

We now check the third condition of Proposition 7. Thinking of Z as a random 
discrete subset of (0,00) x [0,1], let 


Fi =0(Z N ((0,t] x [0,1]), t € f0, 20). 


It follows from the construction that each process Y7” is adapted to the filtration 
(F). Thus, it is sufficient to show for each s € [0,œ0) and z € W, that the 
conditional distribution of t ~~ Y2, given F, is Q**. 

Let Z° be the point process obtained from Z by subtracting s from the first 
coordinate of each of the points in ZN ((s, 00) x [0, 1]). The construction implies 
that the process t ~ Y3,, is given by h(Z°, Y7), where h is the function defined 
prior to the statement of the theorem. We want to show that the conditional 
distribution of h(Z*, YZ) given F, is QY”. 

Since the basic properties of Poisson point processes imply that Z* is indepen- 
dent of F, and since Y7 is ¥,-measurable, it follows from Problem 21, Proposi- 
tion 11, and Proposition 12, all of Chapter 21, that the conditional distribution 
of h(Z*, YZ) given F, is R** , where for each y € Y, RY is the distribution of 
h(Z*,y). It is easy to see that Z° and Z have the same distribution, so Q” = RY 
for all y € WV, as desired. 

We have shown that (Q7: x € W) is a Markov family. The proof that it is 
strong Markov is very similar to the proof of Theorem 11. Following the notation 
in that proof, the only difference is that we do not rely on the continuity of 
T;f near the end of the proof to show that T;f(Xs,) > Tif(Xs) as ô N 0. 
Instead, our construction shows that the event [Xs, = Xs} contains the event 
[Afs, = Ms], which increases to all of Q as ô N, 0. The rest of the proof of 
Theorem 11 can be followed without change. 

To calculate the infinitesimal generator and transition semigroup, we return 
to the construction at the beginning of this section. Let X and M be the 
Markov sequence and standard Poisson process used there to construct X. By 
this construction, 


E(f o Xt) = >> E(f(Xx))P[Mi = k] 
k=0 


for any bounded measurable f: Y — R. Since M is a standard Poisson process 
and EM, = c, the random variable M; is Poisson distributed with mean ct. 


31.5. PURE-JUMP MARKOV PROCESSES: BOUNDED RATES 635 


Thus, 
a kk 
PIM, =k] = et. 


Since X is a Markov sequence with transition operator T and initial state L, 
E(f(Xx)) = T* f(z). 


Since f and x are arbitrary, the formula T; = e~“e°'? follows immediately, 
and the formula T; = etC then follows from standard manipulations involving 
power series. That G is the corresponding infinitesimal generator is now an easy 
consequence of Definition 13. (See also Problem 17.) O 


To see the relevance of the modifying phrase ‘with bounded rates’ for the 
Markov processes we have been discussing, we write the corresponding infinites- 
imal generators G in a different form. 

By Theorem 19 


(31.10) Gf(2) = eG =c J Hoere di Een 


where Hz, x € WV, are the transition distributions of the transition operator T 
For each x € Ẹ, let 


q(x) = cpa (WY \ {x}). 
If g(x) > 0, denote by pz the probability measure on W defined by 
es ba (A \ {r} 
q(x) 
for Borel sets A C W. If g(x) = 0, let pp = ôz. Now (31.10) can be rewritten as 


Px(A) 


(31.11) Gf(2) = a(x) J Pu-i 


for all z € V and bounded measurable f: Y —> R. 

The number q(x) in the formula for G is the jump rate at x of the corre- 
sponding Markov family and pz is the jump distribution from x. The function 
x ~ q(x), denoted by q, is the jump-rate function. For an explanation of this 
terminology, see Problem 20. Note that our construction ensures that the jump- 
rate function is bounded above by c, thus explaining why these processes are said 
to have bounded rates. In the next section, we will discuss pure-jump Markov 
processes with unbounded rates. 


Problem 20. For X as in Theorem 19 with initial state z, let J = inf{t: X: 4 zx}. 
Suppose that q(x) > 0. Prove that the distribution of J is exponential with mean 
1/q(x), that J and X; are independent, and that the distribution of Xy is pz. 


Problem 21. Show that Qo is an equilibrium distribution for the transition semi- 
group defined in Theorem 19 if and only if it is an equilibrium distribution for the 
corresponding transition operator T. 


636 31. INTRODUCTION TO MARKOV PROCESSES 


Problem 22. Let T be a transition operator for Y. Show that if T takes contin- 
uous functions vanishing at œo to continuous functions vanishing at oo, then the 
transition semigroup defined in Theorem 19 is Feller. 


Problem 23. For a pure-jump Markov family with bounded rates and countable 
state space W, let pz,(t) denote the probability that the process with initial state 
xz is at state y at time t, and for y £ x, let qzy = q(x) pz{y}. [The numbers p,, (t) 
are the transition probabilities from x to y and qzy is the transition rate from x to 
y-] Prove that 


— pzy (t)aly) + X pe: (t)dey 


zy 


= Met) = —Dry(t)q(x) a D o 


(31.12) 


for all z, y, and t. 


Problem 24. Use the result in Problem 16 to find all equilibrium distributions for 
an arbitrary pure-jump Markov family with state space {0, 1}. Express your answer 
in terms of the transition rates. 


In the case of a finite state space, Problem 23 gives us two systems of differ- 
ential equations for the transition probabilities. When the state space is small 
enough, elementary methods can be used to solve either system of equations 
explicitly for these functions in terms of the transition rates (see Problem 25). 
Even in the case of a countably infinite state space, it may be possible to solve 
one or both of the two systems. 


* Problem 25. Use Problem 23 to find an explicit formula for the transition semi- 
group of an arbitrary pure-jump Markov process with the two-point state space 
{0,1}. Express your answer in terms of the transition rates. Then make an ap- 
propriate calculation to check for consistency between your answer to this problem 
and your answer to Problem 24. 


31.6. Pure-jump Markov processes: unbounded rates 


We now relax the assumption that the jump rate function q be bounded and 
replace it by the assumption that q is bounded on each compact subset of the 
state space Y, which we assume to be locally compact. In particular Lemma 1 
of Chapter 29 implies that there exists an increasing sequence of compact sets 
An, n = 1,2,..., such that A, 7 Y. 

Under the assumptions just made, it should be clear that for each An, we can 
use either of the two constructions of the preceding section to define a process X 
that ‘behaves like’ a pure-jump Markov process with jump rate function q until 
the first time Un that it jumps to a state in Af. The increasing sequence (Un) 
may or may not approach œ as n — œ. Whether it does or not, let U» denote 


31.6. PURE-JUMP MARKOV PROCESSES: UNBOUNDED RATES 637 


the (random) limit. If U. < œ, we say that an explosion occurs. A standard 
method for dealing with explosions is to adjoin a special state A to the state 
space WV, and then let 

X;=A for t>UQ 


if the initial state is different from A and X; = A for all ¢ if the initial state is 
A. Of course, one does not need to know that there is an explosion in order to 
adjoin A to W. If there is no explosion, A is merely an ‘extra state’, with jump 
rate q(A) = 0. 

By definition the neighborhoods of A are taken to be those sets whose com- 
plements in Y U {A} are compact. What we have described is the one-point 
compactification of Y (see Appendix C). Our assumption that q is bounded on 
compact subsets of Y can be shown to lead to the conclusion that the construc- 
tion described above leads to a D([0, 00), ¥U{A})-valued strong Markov family. 
The main difference between the bounded and unbounded case is that the do- 
main of the infinitesimal generator no longer contains all bounded measurable 
functions, and e’© cannot be used as a formula for the transition semigroup. 
Indeed, the issue of explicitly describing the domain of G is quite complex. 

A process constructed as described above is called a pure-jump Markov pro- 
cess. Some may use the term to describe the process only up to the explosion 
time, thereby making the time domain of the random function into a random 
variable. 


Example 1. [Pure-birth processes] Let ¥ = {0,1,2,...} and let py = 6241 
for x € Y. This choice of jump distributions means that when the process is in 
state xz, its next jump is necessarily to state x + 1. For this reason, jumps are 
called ‘births’, and any pure-jump Markov process with such jump distributions 
is called a pure-birth process. The jump rates (q(x), x € W), are called birth 
rates. 

If the initial state is x for a pure-birth process, then it is easy to see from the 
construction that Tæ is a sum of independent exponentially distributed random 
variables with parameters q(x), g(x + 1), g(x + 2),.... Thus 


1 1 1 


(31.13) ES iy Geely ae 


Calculation of the moment generating function of the R` -valued random vari- 
able U (or alternatively the Three-Series Theorem) shows that U» < œ with 
probability 1 or 0 according as E(U.) is finite or infinite. Therefore there is an 
explosion with probability 1 or 0 according as 


a. i 
X — <% or = o0. 
q(y) 


y= 


638 


31. INTRODUCTION TO MARKOV PROCESSES 


Problem 26. [Birth-death processes] Let {g(x): x € Zt} be jump rates, and let 
(pr: x E€ Z*) be jump distributions on Zt such that po = 61, and for each z > 0, 
Pz is supported by {z — 1,2 +1}. Any pure-jump Markov process constructed 
with such jump rates and jump distributions is called a birth-death process. The 
quantities 
r =q(x)pz{x +1} and 6, = q(r)pe{x — 1} 

are called, respectively, the birth rate and death rate at x. Show that the probability 
of an explosion in a birth-death process is 0 for any initial state if the birth rates 


satisfy 
— 1 
pam. 
x=0 7 


Hint: Construct the birth-death process X jointly with a pure-birth process Y with 
the same birth rates in such a way that X: < Y; for all t € [0, oc}. That is, use a 
coupling argument. 


Problem 27. Consider a birth-death process X with birth rates 3, = x8 and death 
rates 6; = x6, x = 0,1,2,..., where ĝ8,ô are arbitrary nonnegative parameters. 
Show that 

E(X: | Xo) = Nue T 


Hint: Find a differential equation for the expected value on the left, as a function 
of t. 


Problem 28. Discuss how the issue of the domain of an infinitesimal generator is 
related to the issue of using Theorem 14 to obtain (31.12) for a pure-jump Markov 
process with rates that are not necessarily bounded. 


Problem 29. [Branching processes] For x € Z*, define 


q(x) = yx 


for some constant y > 0. Let p be a probability measure on Z* \ {1}, and for 
xz E€ Z+, define pz by the formula 


pr(A) = p(A—ax4+1). 


Processes constructed with such jump rates and jump distributions are called 
branching processes. Show that if p has finite support, then the probability of 
an explosion is 0 for any initial state. 


Problem 30. Show that a branching process with initial state x has the same dis- 
tribution as the sum of x independent branching processes with initial state 1. 


Problem 31. Let X be a branching process with initial state x, defined in terms of 
a measure p with finite support, as in Problem 29. Let V be the extinction time, 
defined by 

V Sintlt> 02x, = 0}. 
Find a formula for P[V < oo] in terms of p. Hint: One approach is to focus on the 
auxiliary Markov sequence Y used in the construction of X. 


31.7. RENEWAL THEORY FOR PURE-JUMP PROCESSES 639 


Problem 32. Suppose that the assumption that p has finite support in the preced- 
ing problem is replaced by the assumption that p is such that the probability of 
an explosion is 0. Does the solution of that problem remain valid? Does it remain 
valid if all assumptions on p are dropped? 


31.7. tł Renewal theory for pure-jump Markov processes 


For pure-jump Markov processes having bounded rates and countable state 
spaces, the concepts of ‘irreducible’ and ‘accessible’ carry over naturally from 
the Markov sequence setting. As we will see, ‘periodicity’ plays no role for the 
same reason that it plays no role for renewal processes. (See Theorem 8 of 
Chapter 30 for example.) 


Proposition 20. Let X be a pure-jump Markov process with bounded rates 
and initial state x. Then 


t~~ Lx, =2] 


is a renewal process that corresponds to a subordinator with drift 1 and finite 
Lévy measure. 


Problem 33. Prove the preceding proposition. 


The next corollary now follows immediately from Theorem 8 of Chapter 30. 


Corollary 21. Let v be the finite Lévy measure of the subordinator corre- 
sponding to a pure-jump Markov process X, as in Proposition 20. The set 
{t: X, = x} is bounded with probability 1 or O according as v{oc} > 0 or 
v{co} =0. Also 

>- 1 
Se Panels Ty par) 

In view of Proposition 20 and Corollary 21, we carry over the terms ‘positive 
recurrent’, ‘null recurrent’, and ‘transient’ from their use for renewal processes 
as in Chapter 30 to states and irreducible classes of pure-jump Markov processes 
with bounded rates. In the case of countable state spaces equilibrium distribu- 
tions are related to the limits in Corollary 21 for various x in the same manner 
as for Markov sequences. 

For an arbitrary specific pure-jump Markov process with bounded rates and 
nonrandom initial state z, one would like to identify the drift and Lévy measure of 
the subordinator corresponding to the renewal process of Proposition 20. Doing 
so would, in particular, enable us to decide whether z is positive recurrent, null 
recurrent, or transitive. The following result accomplishes this goal. 


640 31. INTRODUCTION TO MARKOV PROCESSES 


Proposition 22. Fizc > 0 and let X be a Markov sequence with initial state 
xz. Let X be the pure-jump Markov process with bounded rates constructed from 
c and X as in Theorem 19. Let R denote the waiting time distribution for the 
renewal sequence n ~> Ly _,) Then the renewal process t ~> I,x,—2) corresponds 
to a subordinator with drift 1 Wid Lévy measure v given by 


Y oO n—2 
X y) = ce” 3 wee Rin} for y € (0,00) 


and A 
v{co} = R{oo}, 


where denotes Lebesgue measure. 


Problem 34. Prove the preceding proposition. 


Problem 35. Use Proposition 22 to give a partial check of the calculations in Ex- 
ample 3 of Chapter 30. 


Problem 36. Apply Proposition 22 in case the state space is Z* and the transition 
operator T for the Markov sequence X is given by 


f(z —-1) if z > 0 
for some b € [0,1). Also find all equilibrium distributions, the jump-rate function, 


all transition probabilities, and all transition rates for the Markov process X. 


Problem 37. The Markov process of the preceding problem has the property that 
whenever it leaves the state 0 it makes one visit to the state 1 before returning to 0 
and whenever it leaves the state 1 it makes one visit to the state 0 before returning 
to 1. Reconcile this symmetry with the fact that the equilibrium distribution 
assigns different values to these two states. 


Problem 38. For the setting of Proposition 22 prove that if R{oo} = 0, then 


1+ l yv(dy) = ` nR{n}, 
(0,00) n=1 


whether finite or infinite. Then deduce that the renewal sequence and the renewal 
process are both positive recurrent, both null recurrent, or both transient. 


CHAPTER 32 
Interacting Particle Systems 


An ‘interacting particle system’ can be informally described as a Markov process 
consisting of countably many pure-jump processes that interact by modifying 
each other’s transition rates. Each individual pure-jump process in such a system 
is located at a ‘site’ and has state space {0,1,2,...,n}. The state of the pure- 
jump process at a given site is the number of ‘particles’ at that site, with n being 
the maximum particle number. 

These systems have been used as models in a variety of practical applications, 
in such fields as physics, biology, and computer science. From a mathematical 
point of view, they form a rich class of Markov processes, capable of a wide 
variety of behaviors. They are the focus of much current research, and many 
fundamental questions about them remain to be answered. In this chapter, we 
will introduce them in a way that gives some idea about how they are constructed 
and how they behave, while avoiding many of the technicalities associated with 
the general theory. 


32.1. Configuration spaces and infinitesimal generators 
The state space of an interacting particle system is 
== {0,1,...,n}2", 


where d and n are positive integers. Thus, an element € in this space can be 
regarded as a collection € = (E(x): x € Z®) of nonnegative integers, indexed 
by the d-dimensional integer lattice Z4. We have some special terminology for 
describing this state space and its members. A member of = is a configuration of 
particles, and © itself is configuration space. Configurations are typically denoted 
by the Greek letters €,7,¢. The points in the integer lattice Zt are called sites. 
Given a configuration € = (E(x): x € Z%), the quantity €(z) is the particle 
number at the site z. If E(x) = 0, we sometimes say that the site x is vacant, 
and if €(z) = k # 0, we say that x is occupied by k particles. The parameter n 
is the maximum particle number. 


642 32. INTERACTING PARTICLE SYSTEMS 


Since = is a countable product of finite sets, it is a compact Polish space with 
the product topology. See Problem 2 to help gain an understanding of the notion 
of convergence in such a space. Interacting particle systems are random cadlag 
functions with values in configuration space. In other words, an interacting 
particle system is a D([0, 00), =)-valued random variable X, whose state X; at 
each time t € [0, 00) is a configuration with particle numbers X;(x),x2 € Z4. 

The interacting particle systems that we will study are allowed to make three 
types of transitions. These are: (i) births, in which the particle number at a 
site x increases by 1 (mod n+ 1); (ii) deaths, in which the particle number at a 
site xz decreases by 1 (mod n +1); and (iii) particle jumps, in which a particle is 
transferred from one site x to another site y. When these transitions occur, we 
will say, respectively, that a birth occurs at x, a death occurs at x, and a particle 
jumps from x to y. One may view the occurrence of a particle jump from z to y 
as the simultaneous occurrence of a death at x and a birth at y. 

Note that according to our definitions, if a birth occurs at a site occupied 
by n particles, then that site becomes vacant. This is an example of wrap- 
around. Similarly, wrap-around occurs if there is a death at a vacant site, thereby 
producing a site occupied by n particles. There are some interesting models for 
which this type of behavior is natural. In models where such transitions are not 
desirable, we will exclude them by setting certain transition rates equal to 0. 

It will be useful to have some notation for describing the types of transitions 
defined above. Given a configuration € and a site z, let 


E€ and £ 


denote the configurations obtained from £ by, respectively, increasing or decreas- 
ing the particle number at x by 1 (mod n + 1). Given a configuration € and two 
sites x,y, we define 
PoP a0. 

Then €*, „E, and ,€¥ are the respective results of a birth at z, a death at x, and 
a particle jump from z to y, in the configuration £. 

We will define interacting particle systems in terms of transition rates asso- 
ciated with the three types of transitions just introduced. The birth and death 
rates at a site x are respectively denoted by 


be: =3Rt and dz: E > Rt, 
and the particle jump rate from z to y is denoted by 
EAE Rt. 


Each of these rates is an R*-valued function on configuration space. Thus the 
birth and death rates at a site x and the particle jump rates for sites x and y 
may be influenced by the particle numbers at sites other than x and y. This is 
the ‘interaction’ in interacting particle systems. 


32.1. CONFIGURATION SPACES AND INFINITESIMAL GENERATORS 643 


Recall from Chapter 31 that an infinitesimal generator can be used to give 
an efficient description of a pure-jump Markov process. It should be apparent 
from the discussion up to this point that interacting particle systems are similar 
to pure-jump Markov processes, so it is natural to try to specify an infinitesimal 
generator for an interacting particle system. We will not in general be able to 
make its domain be the set of all bounded measurable functions, so our first 
task is to introduce a somewhat smaller set of functions on which to specify the 
infinitesimal generator. 

For a finite set A = {z1,..., £k} in Z4, let &4 denote the collection of all 
functions of the form 


E ~~ p(E(21),---5 (te); 


where y is a function from {0,...,n}* to R. Because of the way in which they 
are defined, we say that the functions in §4 depend only on the sites in A. Let 


F= |) Ba. 
finite A 
It is easy to check that the functions in § are all bounded and measurable. In 
fact, they are continuous, and every continuous function from = to R can be 
uniformly approximated by a member of § (see Problem 3). 

The infinitesimal generators of interacting particle systems will be defined in 
terms of birth, death, and particle jump rates. In order to avoid some of the 
technicalities involved in such a definition, we will restrict these rates in the 
following way. For r > 0, set 


N(r) = ez jz <r}, 


where |z| denotes the usual Euclidean norm of z. We say that a collection of 
rates Oss Ges Jays TUE Z4 has range r if for all x € Z4, 


br, dx, Jey € S24+N(r) for ye hss 


and 
jay =0 if y ¢ (x + N(r)) \ {z}. 
A collection of rates with range r for some r has finite range. 


We now use rates with finite range to define a class of operators that will be 
used as infinitesimal generators. We set 


GFE) = XO BOUL E) - FO] 


rEzZd 
(32.1) i 2 d(O (E) - FE] 
+ YS OGE- fF es. 
r, yeZ? 


TEY 


644 32. INTERACTING PARTICLE SYSTEMS 


Note that the sums in this expression have only finitely many nonzero terms for 
any given f € 3. When discussing G we will focus on how it acts on members 
of §, although we will eventually show under an additional assumption that 
it represents an infinitesimal generator of a strong Markov family, so that its 
domain may include functions that do not belong to §. 

Here is the definition of ‘interacting particle system’ that is used in this book. 


Definition 1. Let G be defined in terms of rates that have finite range, as 
in (32.1). A Markov process is an interacting particle system if its infinitesimal 
generator has the form (32.1) on ¥. A Markov family of such processes is a 
family of interacting particle systems. 


* Problem 1. Suppose that there is a finite set A such that bs = dr = jey = 0 
for x ¢ A and jey = 0 for y ¢ A. Show that the formula for G in (32.1) is 
meaningful for all bounded measurable functions f and that G with this larger 
domain is the infinitesimal generator of the transition semigroup of a pure-jump 
Markov family with bounded rates, thereby proving the existence of an interacting 
particle system with infinitesimal generator G. Find the corresponding transition 
rates ggn, E, n € =, in terms of bz, dz, and jey. 


Problem 2. Show that a sequence (£x: k > 0) of configurations converges in = to 
a configuration £ if and only if for each site x, there exists an integer l such that 
E(x) = E(x) for all k > L. 


Problem 3. Show that the functions in § are continuous, and that every continuous 
function from = to R is the uniform limit of a sequence of functions in §. 


Problem 4. Let X be an interacting particle system with infinitesimal generator 
G. Show that the random cadlag function Y defined by Y; = Xet, t € [0, 00), is an 
interacting particle system with infinitesimal generator cG. 


32.2. The universal coupling 


In this section, we make the further assumption that all rates are bounded above 
by some finite constant c. In this case we say that G has bounded rates. 

We will describe a procedure for constructing all interacting particle systems 
on Z? having bounded finite-range rates on a common probability space, thereby 
‘coupling’ all these systems. Briefly, this universal coupling involves three steps. 
In the first step, we define an independent collection of Poisson point processes 
and a corresponding probability space; this step is ‘universal’, in the sense that it 
does not depend on the rates of the interacting particle system being constructed. 
At the end of the step, an existence and uniqueness result is stated that specifies 
the precise connection between the Poisson point processes and any interacting 
particle system with a given set of rates. The second and third steps of the 
construction constitute the proof of this result. 


32.2. THE UNIVERSAL COUPLING 645 


In the second step we define certain random objects that, in some sense, 
represent ‘regions of interaction’ between various sites. This step depends on 
the rates of the system being constructed, but not on its initial state. In the 
third step, we show how to use the random objects from the second step together 
with the initial state to calculate the state of the system at any given time. 

First step of the construction. Let 


{B*,D*:2e€ ZU{I*: zy € Zt r Ey} 


be an iid collection of Poisson point processes on (0,00) x (0,00), with inten- 
sity measure A2, where Àz is two-dimensional Lebesgue measure, and denote by 
(Q, F, P) the probability space on which these point processes are defined. The 
points in these point processes will be used to indicate times at which births, 
deaths, and particle jumps might possibly occur. 


Problem 5. Prove that with probability 1, no first coordinates of any of the random 
points in any of these point processes are equal. 


As we continue with the first step of the construction, we note that we will 
always assume that we are working in the event of probability 1 identified in 
the preceding problem. (As we will see, it is the first coordinates that will 
actually be times at which things happen, so there will never be two simultaneous 
happenings.) 

We next define some sub-o-fields of F: for t € [0, 00), let 


Jy = 
o(B™ (0, t] x (0, 00), D™N(0, t] x (0, 00), J?” N (0, t] x (0,00): a,y E€ Z$, £ Fy). 


We will construct processes that are Markov with respect to the filtration (Fi: t € 
[0, o0)). 


Remark 1. It will be clear in the construction that if the rates are bounded 
above by c, then the points of the Poisson processes whose second coordinates 
are larger than c will not play a role in the construction of the corresponding 
family of interacting particle systems. 


Theorem 2. Let (0,7, P) be the probability space for the Poisson point pro- 
cesses described above. For each set of rates bz,dz,jry, x,y E€ ZÌ, x £ y, and 
each initial state € € =, there exists an a.s.-unique D([0, 00), =)-valued random 
variable X satisfying the following four rules for all x,y € Z? and t € (0,00): 


(i) Initial state rule. Xo = £; 

(ii) Birth rule. X has a birth at x at time t if and only if there exists 
u E€ (0,00) such that (t,u) € B® and b,(X;_) > u; 

(iii) Death rule. X has a death at x at time t if and only if there exists 
u € (0,00) such that (t,u) € D® and d,(X;4_) > u; 


646 32. INTERACTING PARTICLE SYSTEMS 


(iv) Particle jump rule. X has a particle jump from z to y at time t if and 
only if there exists u € (0,00) such that (t,u) € J?” and jry(Xt_) > u; 


Moreover, X is a strong Markov process with respect to the filtration (Fa: t € 
(0, 0o)) defined above, whose infinitesimal generator is given by 32.1 on §. 


PROOF. As stated above, the bulk of the proof consists of the second and 
third steps in the construction. 

Second step of the construction. This step depends on the Poisson point 
processes introduced in the preceding step, the bound c, and the range r. For 
each x we create a point process Tz = {Ty 1 < Tz. <...} on RY \ {0} via 


Ten = inf{t > Tzn-1: (tu) E€ BUDU [J (J= UJ") for some v < c}, 
yer+N(r) 
yT 

where Tz o is defined to equal 0. Since projections and independent unions of 
Poisson point processes are themselves Poisson point processes, provided the 
projections of the intensity measures are Radon measures, it is clear that the 
point processes {, are identically distributed Poisson point processes with com- 
mon intensity measure equal to 2c N (r) times 1-dimensional Lebesgue measure. 
The rules given in the statement of the theorem indicate that we want X to 
have the property that for each z, t ~~ X(x) is constant on intervals of the form 
[Tz i—1, Tzi). Thus, the random times in T, are the only times at which we allow 
possible births, deaths, or particle jumps that involve the site x. We will need 
to wait until the third step of the construction to see just how it is determined 
whether such transitions actually occur at these times. 

Now fix x and a positive integer 7. We will inductively define a sequence of 
random times U; > Up >... and a sequence of random finite sets A; C Ao C... 
associated with the pair (2,7). Let U; = Tz, If (U1,v) E€ Jyz for some v and y, 
let Ay = y+ N(r). Otherwise let A; = x + N(r). Proceeding recursively and 
using the convention sup @ = 0, we set 


Uj41 = sup{t < U}: t € T, for some z € A;} 
if U; > 0 and = 0 if U; = 0, and we set 


Aj41 = Aj U J (z+ N(r)) 
z: Usa €2, 
if Uj41 > 0 and = A; if Uj41 = 0. (Notice that Uj41 € T, for either one or two 
values of z in case U;41 > 0.) Having defined the random times U;, j = 1,2,..., 
let 
K(a,i) = sup{j: U; > 0}. 

We will show in this step that K(z,7) is a.s.-finite. 

The times U; and sets A; are illustrated in Figure 32.1. The dots connected 
by curved arrows represent the locations and times of possible particle jumps; 
an arrow from (w,s) to (z,s) indicates that s € J”*. The other dots represent 


32.2. THE UNIVERSAL COUPLING 647 


the locations and times of possible births or deaths. The value K = 7 refers to 
the fact that U7 > 0 but Ug = 0. 


time 


FIGURE 32.1. Second step of construction for r = d = 1, with 
K 


We want to prove that with probability one, U; = 0 for some j, thereby show- 
ing that K(z,i) is a.s.-finite. Let us calculate a conditional moment generating 
function: 


2 | U; FD Aig Ay) 
a cs 1-Un m1 U; Æ 0; Aj,..., Aj) 


= eC 
<Ï 2em[tN(r)]? 


» w+ 2em| tN (r J 


where the last step uses f{A; < jfN(r) and the fact that each Poisson point 
process T, has intensity 2c?N(r) times Lebesgue measure. For w > 0, this 
product approaches 0 as 7 => oo. It follows that, conditioned on U; # 0, Ui — 
U; — oo in distribution. By the monotonicity of 7 ~~ U;, we see that for almost 
every w € ()5- [Uj # 0], Ui(w) — U;(w) > œ as j > oo. Since Uj (w) — U; (w) < 
Ui (w) < œ, we see that ae ə{w: U;(w) # 0} is a null event, as desired. 

We have shown that A(z, 7) is a.s.-finite. Since there are only countably many 
ordered pairs (x, i), the random function K, defined by (2,1) ~ K(z,i), is almost 
surely ZT \ {0}-valued. 


648 32. INTERACTING PARTICLE SYSTEMS 


Third step of the construction. This step will depend on the initial state 
€. For each site z and each time T}; € T}, we need to decide whether or not a 
transition involving x occurs at that time, and if so, whether it is a birth, death, 
or particle jump. 

We will define X;(x) for t € [Tz ;-1,Tz,;) by induction on the value of K (x,t). 
To get the induction started, we let K(z,0) = 0 and define X;(z) = E(x) for all 
sites x and all times t € [0, 7,1), consistent with the four rules in the statement 
of the theorem. 

Now assume inductively that X;(z) has been defined for t € [T,;-1,7T2,;) for 
all pairs (2,7) such that K(z,i) < k, where k is some positive integer. Now fix z 
and i such that K(2,i) = k. Let A; be the random set corresponding to z and 
i defined in the second step of the construction. It follows from that step of the 
construction that K(z,j) < k for each z € A; and each integer j > 0 such that 
Tz; < Tz,;. By the inductive hypothesis, the quantities Xr, ,.(z) are defined 
for z € A;. Now define Xr, (x) in terms of these quantities as follows: 


e If (T,i;,v) € By for some (no more than one) v, then we introduce a birth 
at time T,,; and site x if v < b;(X7, ,-), and make no change at that 
site and time otherwise; that is, Xr, ,(z) = 1+ Xr, ,-(x) (mod n + 1) 
if v < be(XrT, ,-), and X7, ,(z) = Xr, ,- (x) otherwise. 

e If (T,i,v) € Dz for some v, then we introduce a death at time Tzi 
and site x if v < d;(X7, ,-), and make no change at that site and time 
otherwise. 

e If (T,i,v) € Jey for some v and y, then we subtract 1 (mod n+ 1) from 
the particle number at site x at time Tz, if v < jey(X7,,—), and make 
no change at that site and time otherwise. 

e If (T,,;,v) E Jys for some v, then we add 1 (mod n + 1) to the particle 
number at site x at time Tp, if v < Jyr( XT, ;—), and make no change at 
that site and time otherwise. 


For t € [Tr i, Tr,i41), let X(x) = Xr, (x). It is easy to see from the definitions 
that, for all sites z, Ty; — 00 a.s. as 1 — 00, so we have defined X;(z) for all 
sites x and times t € [0, 00). 

Our construction of X is complete. Note that this construction has the prop- 
erty that if X, is known for some s € [0,00), then for any time t > s, X; can be 
determined by treating s as if it were time 0 and X, as if it were an initial state, 
and then following the construction procedure using the portions of the Poisson 
point processes B*, D*, J*¥ that lie in (s,co) x (0,00), suitably shifted in the 
negative first-coordinate direction by s units. By the independence properties 
of Poisson point processes, these shifted point processes are independent of the 
o-field F,. It follows that the conditional distribution given F, of the stochastic 
process (X;: t > s) depends only on X.. Thus, X is Markov. A similar argu- 
ment, in which the time s is replaced by a stopping time, can be used to show 
that X is strong Markov. Alternatively, one can show that X is strong Markov 
by showing that it is Feller, as the reader is requested to do in Problem 7. O 


32.2. THE UNIVERSAL COUPLING 649 


An important feature of the construction just given is that for a fixed site x 
and time t, the value of X;(x) can be determined from finitely many of the values 
€(z) in the initial state € and the intersections of finitely many of the Poisson 
point processes with some compact subset of (0,00) x (0,00). The random sets 
A1, Ao,..., are used to indicate exactly which of the values €(z) and what parts 
of the Poisson point processes are needed. One consequence of this observation 
is that for any site x and time t, there exists a random set A such that the 
processes with initial states € and 7 agree at x at time t if €(y) and n(y) agree 
at sites y € A. This consequence is relevant for proving that the processes we 
have constructed are Feller (Problem 7). 

For the purposes of understanding the behavior of interacting particle sys- 
tems, it can be often useful to consider the rules in Theorem 2 to constitute 
the definition of X. The following example illustrates both the construction and 
Theorem 2. 


Example 1. Let the dimension d and maximum particle number n both be 
1. Define rates with range 1 as follows: 


bl =O- EEEE- IEE], del) FEl), dey =0. 


Thus, the particle jump rates are all 0, and the death rate at an occupied site 
is 1/5. There are no deaths at vacant sites or births at occupied sites. The only 
real ‘interaction’ between different sites involves the birth rates at vacant sites, 
which can be 0, .5, or 1, depending on how many of the neighbors are occupied. 
We take as the initial state the configuration € in which only the sites —1 and 1 
are occupied; that is, €(1) = €(—1) = 1 and €(z) = 0 if zr 4 41. 

Figure 32.2 illustrates the time evolution of X constructed as described in 
this section. The axes are oriented the same as in Figure 32.1. The open circles 
indicate possible birth times, and the black dots indicate possible death times, 
with the upper bound 1. We have not bothered to show the possible particle 
jump times, since they are irrelevant when the particle jump rates are 0. The 
numbers next to the possible birth and death times are the values of the second 
coordinate of the corresponding points in the relevant Poisson point processes. 
The vertical axes are thickened to indicate occupied sites. By following the rules 
in Theorem 2, the reader should be able to verify the correctness of the figure. 


Problem 6. Using Figure 32.2, calculate the time evolution of the interacting par- 
ticle system with the same rates as in Example 1, but with the initial state in 
which only 0 is occupied. 


Problem 7. Prove that the transition semigroups of the processes constructed in 
this section with bounded finite-range transition rates are Feller, and hence that 
the collection of processes constructed from given rates and various initial states 
is a strong Markov family. 


650 32. INTERACTING PARTICLE SYSTEMS 


2 
25 
l C).8 
4 or: 
Or 
0.2 
6 
time C).3 
5 
15 C).75 
C).1 
C).7 
C) .65 
).2 
zZ! 
-3 -2 -] 0 1 2 3 


FIGURE 32.2. Time evolution of an interacting particle system 


Problem 8. Show that the infinitesimal generator of a Markov family constructed 
as in this section with bounded finite-range transition rates agrees with G as defined 
in (32.1) for f € §. Hint: One approach is to use the definition of the infinitesimal 
generator in terms of the transition semigroup. 


Remark 2. It can be shown that ‘two’ Markov families of distributions each 
having an infinitesimal generator that agrees with some common G of the form 
(32.1) for f € § are identical. 


Problem 9. Fix the dimension d, the range r, the maximum particle number n, 
and the upper bound on the rates c, and within this context, let G and (GY:k= 
1,2,...) be infinitesimal generators of the form (32.1). Let X and KX” k= 
1,2,..., be corresponding interacting particle systems, all constructed by way of 
the universal construction, with respective initial states € and k, k = 1,2,... 
Show that if ££ — € as k > oo and for each f € §, there is a j(f) such that 
G®)f = Gf for all k > j(f), then with probability 1, x) + X: ask > oo, 
uniformly for t in bounded subsets of [0, 00). 


Problem 10. Weaken the hypothesis in the last sentence of Problem 9 so an ‘if and 
only if’ result is obtained. 


32.3. EXAMPLES 651 


32.3. Examples 


In each of the following examples, we will give the dimension d, the maximum 
particle number n, the range r, and the rates, along with a brief heuristic de- 
scription of the behavior of the corresponding interacting particle systems. The 
reader may find Theorem 2 useful in verifying that, in each case, the description 
matches the rates. 


Example 2. [Contact process with linear birth rate] We take d and r to be 
arbitrary positive integers, and n = 1. The particle jump rates are 0, the death 
rates are given by 


da (€) = 0€(x) 


for some parameter 6 > 0, and the birth rates are given by 


bs (€) = Geo TR 


yer+N(r) 

YF 
This example generalizes Example 1. The birth rate at a vacant site is propor- 
tional to the number of occupied neighbors, with the maximum birth rate being 
1. The death rate at an occupied site is the constant 6. There is no ‘wrap-around’ 
(that is, births do not occur at sites containing n = 1 particles, and deaths do 
not occur at vacant sites). The state of extinction, which is the configuration in 
which all sites are vacant, denoted by 0, is absorbing, since all of the rates are 0 
in that state. 

One imagines a population of simple organisms that do not move about, but 
which have offspring at neighboring vacant sites. The organisms die at a constant 
rate. If all of the neighbors of an occupied site are vacant, then the organism at 
that site has a total propagation rate of 1, and in general, the total propagation 
rate of an organism is proportional to the number of vacant neighboring sites. 
Thus, the average birth rate per capita is smaller in ‘crowded’ populations than 
in ‘sparse’ ones. Since it only takes a single occupied site to cause births at 
other sites, we sometimes say that this process exhibits ‘asexual reproduction’. 
See Problem 12 below for a variation with ‘sexual reproduction’. 

An interesting question is whether a contact process starting with a single 
occupied site can survive, or in other words, avoid the state of extinction, with 
positive probability. It is not hard to show (see Problem 13 below) that the 
answer is ‘no’ if 6 is large enough. It has been shown, using an argument that 
we omit, that the answer is ‘yes’ for sufficiently small 6 > 0. 


Example 3. [Particle jump process with exclusion] The quantities d and r 
are arbitrary positive integers, and n = 1. The birth and death rates are all 0. 
The particle jump rates are defined in terms of a probability measure p on N(r): 


Jey(S) = €(2)(1 — E(y) oly — 2}. 


652 32. INTERACTING PARTICLE SYSTEMS 


Since jz, can only be positive if x is occupied and y is vacant, there is no wrap- 
around. 

One imagines a system of particles, each of which attempts at rate 1 to take 
a step with step distribution p. If the step would take the particle to a vacant 
site, the particle jumps to that site. If it would take the particle to an occupied 
site, the particle stays where it is. The word ‘exclusion’ in the name ‘particle 
jump process with exclusion’ refers to this last part of the description. We will 
see in the next section that exclusion processes have very simple equilibrium 
distributions. 


Example 4. [Particle jump process with wrap-around] We make two changes 
from the preceding example: we let n be an arbitrary positive integer, and define 
the particle jump rates as 


jJey(€) = E(x) p{y — z}. 


Thus there is no exclusion, and when a particle jumps to a site containing n 
particles, that site becomes vacant, due to wrap-around. Since jz, equals the 
number of particles at x, one may take the point of view that each particle at x 
is independently jumping at rate 1, with step-size distribution p. 

It can be shown that the limit as n — oo of this model exists (in the same 
sense as in Problem 9), provided the initial state is not too ‘wild’. We call 
this limit a system of independent random walks, since it can be described by 
saying that each particle performs an independent random walk with step-size 
distribution p, the steps being taken at rate 1. 


Example 5. [Cyclic threshold process] The dimension d, the maximal parti- 
cle number n, and the range r are all arbitrary positive integers. The death and 
particle jump rates equal 0. The birth rates are defined in terms of a parameter 
6 €[0,2N(r) — 1: 


b,(£) = t if t{y € £ + N(r): Ely) = E(z) +1 (mod n +1)} > 6 


0 otherwise. 


The word ‘cyclic’ refers to the way in which the number of particles at a site 
cycle through the integers 0,...,n, with wrap-around. The parameter @ is the 
‘threshold’. Sometimes these models are called ‘food-chain’ models, particularly 
when @ = 1; one imagines that the quantity E(x) represents a species type at 
x rather than a particle number, and that individuals of species k are ‘eaten’ 
by individuals of species k + 1 (mod n + 1) in their neighborhood. There are 
many interesting open questions about the behavior of such models, particularly 
concerning the dependence of that behavior on n,r,@, and also on the ‘shape’ 
of the neighborhood N(r). (It is only for simplicity that we have made N(r) 
spherical.) 


32.3. EXAMPLES 653 


Example 6. [Majority vote process] The quantities d and r are arbitrary 
positive integers, and n = 1. The particle jump rates are all 0. Let A C N(r) be 
a set containing an odd number of sites, and define birth rates by 


bz (€) = [1 — €(x)][e V maj, (€)] and dz(f) = €(x)[e V (1 — maj, (&))] 


where € is a nonnegative parameter and 


1 if aie ee E(y) > stA 
0 otherwise. 


Note that the rates do not allow wrap-around. 

To understand this model, first consider the case € = 0. Imagine that the 
particle number at a site z, which can be either 0 or 1, represents the ‘opinion’ 
of an individual at x. The birth and death rates only allow the individual at x 
to change opinion when the majority of the individuals at sites in A + z hold 
the opposing opinion, in which case the rate of change is 1. The set A is usually 
chosen to contain the origin, so that the opinion of the individual at x counts in 
the majority vote. Often A is chosen to equal N(r). 

When e€ > 0, individuals can change their opinion, even when they are cur- 
rently in the majority. This ‘noise’ perturbs the system away from its tendency 
towards unanimity. Thus, even if the initial state is 0 = all 0’s or 1 = all 1’s, the 
noise will cause infinitely many individuals to change their opinion before any 
time t > 0 (see Problem 14 below). 

An interesting question is whether or not a system which starts with unanimity 
will stay ‘close’ to unanimity for all times t if € > 0 is sufficiently small. More 
precisely, if the initial state £ is 0 or 1, is it true that for all p > 0 there exists 
an € > 0 such that 

IE(X§(x)) — &(2)| <p 
for all t € [0,00)? It is known that the answer is ‘no’ for d = r = 1 and any 
choice of A C N(r), and that the answer is ‘yes’ for d > 2, r arbitrary, and 
any choice of A containing the origin such that A \ {0} lies strictly inside a 
half-space whose boundary contains the origin. For example, take d = 2, r = 1, 
and A = (0,0), (0,1),(1,0). The answer is unknown for most other cases. In 
particular, the answer is unknown for A = N(r) if d > 2 andr > 1. 


Problem 11. [Contact process with threshold birth rates] Change the birth rates 
in Example 2 so that they only take two different values: the birth rate at x is 1 
if x is vacant and at least one site in z + N(r) is occupied; the birth rate at x is 0 
otherwise. Let (X$: € € =) be the family of interacting particle systems with these 
rates, and let (Y$: € € =) be the family of interacting particle systems whose rates 
are given in Example 2. Show that for all € € =, z € Zf, and t € [0, 00), 


P(X? (2) = 1) > P[Y$ (z) = 1]. 


Hint: Use the universal coupling to directly compare the two processes. 


654 


32. INTERACTING PARTICLE SYSTEMS 


Problem 12. [Contact process with sexual reproduction] Modify the rates in the 
preceding problem so that the birth rate at x is 1 if z is vacant and at least two 
sites in x + N(r) are occupied, and 0 otherwise. Show that if r = 1 and the initial 
state contains only finitely many occupied sites, then the hitting time of the state 
0 is finite a.s. Hint: First show that if B is a d-dimensional box with sides parallel 
to the coordinate axes and B contains all of the sites that are occupied in the 
initial state, then no site outside of B can ever become occupied. Then use the 
Borel-Cantelli Lemma to show that there must exist a time interval [k, k+1] during 
which deaths occur at all of the occupied sites in B and no births occur at any of 
the vacant sites in B. 


Problem 13. Show that for any of the three types of contact processes defined in 
this section (Example 2, Problem 11, Problem 12), if 6 > 1, then the hitting time 
of the state 0 is finite a.s. for any initial state with only finitely many occupied 
sites. Hint: One way to do this is to use appropriately chosen martingales involving 
the infinitesimal generator G. 


Problem 14. Let G be an infinitesimal generator with bounded finite-range rates. 
Show that if the birth and death rates are all bounded below by a constant € > 0, 
then with probability 1, infinitely many births and deaths will occur during any 
time interval with positive length. 


Problem 15. Show that any family of interacting particle systems with r = 0 is an 
independent family of pure-jump Markov processes. Also show that in this case, 
the method of the previous section can be used to construct interacting particle 
systems even in the case that the rates are not all bounded above by some constant. 


Problem 16. [An example due to Blackwell] Let X be the interacting particle sys- 
tem with d = n = 1, r = 0, particle jump rates equal to 0, initial state 0, and birth 
and death rates given by 


be(E) = (1—€(x)) and de(£) = €(x)2'". 
(See Problem 15.) Show that for all t € [0, 00), 
S X2) <ooas. , 
rEZ 
but that for all t € (0,00), 
P[3s € (0,t): 5) Xs(2)= œ] =1. 


rEg 


Problem 17. {Annihilating random walks} Explain why the wrap-around particle 
jump model with n = 1 could be called a system of ‘annihilating random walks’. 


Problem 18. [Coalescing random walks] Define the birth, death, and particle jump 
rates for a family of interacting particle systems with n = 1 that matches the 
following heuristic description. Let p be a probability measure on N(r) \ {0}. 
Particles jump at rate 1 with step-size distribution p. If a particle jumps to a site 
y that is already occupied, the two particles coalesce into a single particle that 
occupies y. 


32.4. EQUILIBRIUM DISTRIBUTIONS 655 


32.4. Equilibrium distributions 


Most of the research in interacting particle systems is related in some way to the 
following two questions concerning a given transition semigroup (T: t € [0, 00)): 
(i) What are the equilibrium distributions of the transition semigroup? and (ii) 
If u is such an equilibrium distribution, for what distributions v on the state 
space is it the case that vT; — u as t > oo? In this section, we give some basic 
results that can be useful in investigating the first question. In the next section, 
we will have something to say about the second question for a special class of 
systems. 

First we note that under our assumption that the maximum particle number 
n is finite, there always exists at least one equilibrium distribution for the tran- 
sition semigroup of any interacting particle system. This fact follows from the 
compactness of the state space =, just as in Problem 48 of Chapter 26. 

Next, we wish to give a criterion for determining whether a probability mea- 
sure y is an equilibrium distribution. This criterion is analogous to the one given 
in Problem 16 of Chapter 31. 


Theorem 3. Let G be defined as in (32.1) with bounded finite-range rates. 
Then a probability measure u on the state space = is an equilibrium distribution 
of the transition semigroup with infinitesimal generator G if and only if 


(32.2) fos du = 0 


for all f EŞ. 


PROOF. The ‘only if’? part of the theorem is a trivial consequence of the 
definition of infinitesimal generator and the Bounded Convergence Theorem. 
We now prove the ‘if’ part. 

For finite A C Zt, let GA be the infinitesimal generator with birth, death, 
and jump rates bz A, dzr,A, Jey, a given by 


ale , (beln) + Y jee (€)) uldn | n(a) = El), x € A] 


zEAS 


dea(6) = | (den) + Y jee(6) wld | n(2) = (2), € A 


zEAs 
ne / jny(n) wldn | n(x) = €(a),2 € A] 


for z,y € A. Let the rates equal 0 for x or y in A°. 
By Problem 1, G4 is the infinitesimal generator of a pure-jump Markov pro- 
cess. It is straightforward to check from the definition of G4 that 


(32.3) J Gaio asa a0 


656 32. INTERACTING PARTICLE SYSTEMS 


for all f € Şa, where p4 is the marginal of u on {0,...,n}4 and v is any 
probability measure on {0,...,n}4°. Since every function in ¥ can be written 
as a finite linear combination of products of functions in §4 and functions that 
depend only on sites in A‘, it follows that (32.3) holds for all f € %. Thus, 
La X va is an equilibrium distribution for the transition semigroup associated 
with GA. 

Clearly, no matter how the measures v4 are chosen, 


‘ee TN P 

Since G4 agrees with G on Ja for each finite A, it follows from this fact and 
Problem 9 that the interacting particle system with infinitesimal generator G4 
and initial distribution 44 x v4 converges in distribution as A 7 Z? to the inter- 
acting particle system with infinitesimal generator G and initial distribution yp. 
Since 44 X va is an equilibrium distribution for the transition semigroup associ- 
ated with G4, it follows that u is an equilibrium distribution for the transition 
semigroup associated with G. O 


Example 7. Let G be the infinitesimal generator of a particle jump process 
with exclusion, and let pp be Bernoulli product measure on = with parameter 
p € [0,1], which is determined by the following: 


p{E: E(x) =1,x2 € A} = p'4 for all finite A C ZÊ. 


We will check that u satisfies the criterion of the preceding theorem. 

Since n = 1, it can be shown with standard arguments that every function 
in $ is a finite linear combination of the functions f4, A a finite subset of Z%, 
where 


fa(é) = T &@). 


TEA 
Thus, it is enough to check (32.2) for f = fa. Straightforward calculations give 


[ Giadu= 90-9 ` (ole - 9} - oly- 2})), 


reA,ygA 


where p is the step distribution used in the definition of the particle jump process 
with exclusion. We leave it to the reader to check that the sum in this expression 


is 0. 


32.5. SYSTEMS WITH ATTRACTIVE INFINITESIMAL GENERATORS 657 


32.5. Systems with attractive infinitesimal generators 


In the study of Markov sequences with countable state spaces, we found that 
an irreducible transition operator T has a unique equilibrium distribution py if 
and only if T is positive recurrent, in which case any Markov sequence X with 
transition operator T has the property that Xn converges in distribution to p as 
n — co. For interacting particle systems, the story is not so simple. However, 
there is one class of systems for which we can use the theory developed in this 
chapter to make some useful observations. One characteristic of systems in this 
class is a lack of wrap-around behavior. Also for simplicity, we will restrict our 
attention to systems for which the particle jump rates are all 0. 

For €,n € E, we write E > provided that €(x) > n(x) for all z € Z% A 
function f: = —> R is increasing if f(E) > f(n) whenever € > 7. A function f is 
decreasing if —f is increasing. 

We now define a property of birth and death rates which will ensure that any 
transition semigroup defined in terms of such rates (with particle jump rates 
equal to 0) will consist of transition operators that take increasing functions to 
increasing functions. 


Definition 4. An infinitesimal generator G defined as in (32.1) with particle 
jump rates equal to 0 is attractive if for all x € Z4, 


(i) there is no wrap-around at x, that is, b.(£) = E(x) equals the 
maximum particle number and d,() = 0 if E(x) = 
(ii) if € > 9 and €(x) = n(x), then b2(€) > bz e 
(iii) if € > n and €(x) = n(x), then dz(€) < dz (n). 


Conditions (ii) and (iii) in Definition 4 do not say that the birth rates are 
increasing and the death rates are decreasing. In fact, because of (i), increasing 
attractive birth rates and decreasing attractive death rates are necessarily equal 
to 0. 


Theorem 5. Let G be an attractive infinitesimal generator having bounded 
finite-range rates, and let (T;: t € [0,00)) denote the corresponding transition 
semigroup. If f: = —> R is a bounded measurable increasing function, then T; f 
is an increasing function for all t € [0, œ). 


PROOF. Fix £,7 € = such that € > 7. Let X£, X” be interacting particle 
systems with infinitesimal generator G, constructed by way of the universal cou- 
pling, as in Section 2. By the rules in Theorem 2, X: > X; for all t € [0, 20). 
Since T, f (€) = E(f(X$§)) and T; f(n) = E(f(X/)), the theorem follows. O 


Corollary 6. Let G be an attractive infinitesimal generator with bounded 
finite-range rates, (Ti: t € [0,00)) the corresponding transition semigroup, and 
Het, EE =,t € [0, 00), the associated transition distributions. Then the limits 


uw = lim pa, and pt = lim pat 
t00° 7° t= i 


658 32. INTERACTING PARTICLE SYSTEMS 


exist, where 0 is the configuration in which all sites are vacant and ñ is the config- 
uration in which all sites are occupied by n particles (the maximum number), and 
uT and u™ are both equilibrium distributions for (T: t € [0,00)). Furthermore, 
if uT = pt, then there are no other equilibrium distributions for this transition 


semigroup, and 


lim pes = po = pr 
too 


forall E E =. 


PROOF. By the definition of transition semigroup, 


Ti+2f(0) = J T, f (n) mos (dn) 


for all s,t € [0, o0), € € =, and bounded measurable functions f: = > R. If f is 
increasing, Theorem 5 implies that 7;f is increasing, so 


Tiss f(0) > / T; f ©) 5,6(dn) = Tf (0). 


Thus the function t ~~ Tf(0) is increasing in ¢ for all bounded measurable 
increasing f, and hence 


(32.4) lim J tdms 
t= œ : 


exists. It is easy to show that every function in § is a finite linear combination 
of bounded measurable increasing functions, so (32.4) exists for all f € §. It fol- 
lows from Problem 3 that (32.4) holds for all continuous f. Since = is compact, 
the collection (ug: t € [0,00)) is relatively sequentially compact by Proposi- 
tion 14 of Chapter 18. Since (32.4) holds for all continuous f, Proposition 15 of 
Chapter 18 implies that the limit ~ exists. Similarly, the limit wt exists. We 
leave it to the reader to show that u~ and u* are equilibrium distributions (see 
Problem 19 below). 

As we have already seen, 7; f is increasing if f is. For such f it follows that 
the functions lim inf;_,.. Tif and limsup,_,,, Tif are increasing. Thus the first 
of these is greater than or equal to f f du and the latter is less than or equal 
to fdu*. Therefore, if p7 = p*, then 


jim f fap: = jim TH) = [tac = | past 


for all increasing f and all €. As in the argument of the preceding paragraph, 
we may extend this equation to all continuous functions f. The final assertion 
in the corollary then follows. © 


The measure u` in the preceding corollary is the lower equilibrium distri- 
bution and u? is the upper equilibrium distribution of the transition semigroup 
corresponding to G. 


32.5. SYSTEMS WITH ATTRACTIVE INFINITESIMAL GENERATORS 659 


Because of Theorem 5, Corollary 6, and related results, much more is known 
about attractive interacting particle systems than nonattractive ones. The fol- 
lowing example is typical of the kind of result that can be relatively easily ob- 
tained in the attractive setting. 


Example 8. We take the dimension d to be 1, and let the range r and maxi- 
mum particle number n be arbitrary positive integers. In this setting, let G be an 


attractive generator with translation invariant rates, and suppose that b,(0) = 0 
for all z € Z. Let X be the interacting particle system with generator G and 


initial state éo, where 
n ifr<0 
fo(x) = 


0 otherwise. 


For t € [0, 00), let 
Z, = sup{z € Z: X;(x) > 0}. 


Thus, Z: is the rightmost occupied site at time t. Of course, Zo = 0. We claim 
that Z,/t converges a.s. to a constant R € [—00,00) as t > oo. The limit R is 
called the right edge speed . There is an analogous definition for the left edge 
speed L, using the initial state in which the sites x < 0 are vacant and the sites 
x > 0 are maximally occupied. 

The proof of the claim uses the Kingman-Liggett Subadditive Ergodic The- 
orem. In order to use that theorem, we define the appropriate processes Zn. 
Assume that X has been constructed using the universal coupling, and let 
Zon = Zn V (—ln), n = 1,2,..., where l is a fixed positive integer. Eventu- 
ally, we will let 1 > oo. 

Fix m > 0. We will define Zm,n in terms of an interacting particle system 
X'™ with random initial state Em, where 


n iff < Zom 
Ente) = : 
0 otherwise. 


In order to construct X'™), we first create some new Poisson point processes 
from the ones used to construct X. We do this by subtracting m from the first 
coordinates of each of the points in all of the original Poisson point processes, 
and then intersecting each of the shifted point processes with (0,00) x (0,00). 
The resulting Poisson point processes on (0, 00) x (0, 00) are independent of Zo .m, 
and have the same distribution as the original point processes that were used to 
construct X. We use these shifted point processes to construct X(™) according 
to the universal construction given in Section 2, and then for m < n, set 


Zmn = (sup{z € Z:X™) >0}- Zom) V(—-U(n —m)). 


n— 


We now indicate why the hypotheses of the Kingman-Liggett Theorem are 
satisfied. It should be clear from the construction that the distribution of the 
random sequence (Zk k}n: n = 1,2,...) is the same for all k > 0, and that 


660 32. INTERACTING PARTICLE SYSTEMS 


the random sequence (Zin k(n+1): n = 0,1,2,...) is iid for each such k, hence 
stationary and ergodic. It can be shown as in the proof of Theorem 5 that 
because the generator is attractive, X(™ (x) > X(x) for all x € Z. It follows 
from the definition of Zm,n that Zon < Zom + Zm n for 0 < m < n. Clearly 
E(Zo.n) > —ln. The reader is requested in Problem 23 to show that E(Zo n) < 
yn for some finite constant y. Thus, the hypotheses of the Kingman-Liggett 
Theorem are satisfied. 

Let Ri = limp+oo(Zon/n). It is not hard to show that there exists a constant 


R € [-œ, œ) such that 


R= lim R = lim — = lim Bane. 
l> 00 n=>œ n t— oo 
It can be shown that when r = 1, yt = p` if and only if R < L. The complete 
proof of this remarkable result is quite difficult. However, readers who have 
mastered the material in this chapter should be able to prove the ‘if direction’, 
for arbitrary values of the range r (see Problem 24 for the case R < 0 < L). 


Problem 19. Complete the proof of Corollary 6 by showing that „` and pt are 
equilibrium distributions of the transition semigroup (T: t € [0,00)). 


Problem 20. Show that ut and u” are extremal in the set of equilibrium distri- 
butions of the corresponding transition semigroup, in the sense that neither can 
be written as a nontrivial convex combination of two different equilibrium distri- 


butions for that semigroup. 


Problem 21. Which of the following types of interacting particle systems have at- 
tractive infinitesimal generators: contact processes with linear birth rates, contact 
processes with threshold birth rates, contact processes with sexual reproduction, 
cyclic threshold processes, majority vote processes? 


Problem 22. Let X and Y be two interacting particle systems with attractive 
generators Gx and Gy and initial states éx and y respectively. Suppose that 
each birth rate of the generator Gx is greater than or equal to the corresponding 
birth rate of the generator Gy , and that each death rate of the generator G x is less 
than or equal to the corresponding death rate of the generator Gy. Show that if X 
and Y are coupled with the universal coupling and Xo(r) > Yo(z) for all x € Z%, 
then X¢(z) > Yı (x) for all t € [0,00) and z € Z’. 


Problem 23. Prove the three facts about the random variables Zm,n that were left 
unproved in Example 8, namely, (i) that E(Zo,n) < yn for some finite y, (ii) that 
(Z,/n) > R a.s. as n > oo, and (iii) that (Z;/t) ~ Ras. as t + oo. Hint: The 
‘right edge’ can move at most r units to the right in one transition, and the rate 
at which it moves to the right is at most r times the maximum birth rate. 


* Problem 24. Let XE be an interacting particle system with initial state € and 
generator G satisfying the conditions of Example 8. Show that if R < 0 < L, then 


lim: XE =O as. 


CHAPTER 33 
Diffusions and Stochastic Calculus 


A diffusion is a time-homogeneous continuous-in-time strong Markov process. 
Most often, the state space is IR?, although other spaces are also considered, 
especially in current research. 

No one knows how to characterize or construct all diffusions, even in the Rt- 
setting (except for the case d = 1). Since our intention is only to provide a 
brief introduction to diffusions, we will focus most of our attention on the state 
space R, and even within that restricted setting, on a class of diffusions that is 
particularly well understood. Our main emphasis will be on those results that 
generalize relatively easily to the R?-setting with d > 1, and in the final section 
of the chapter, we will say a little about how that generalization is accomplished. 

As in the previous two chapters, it will be useful to describe these Markov 
processes in terms of generators and solutions to the martingale problem. This 
approach leads to a nice connection between diffusions and certain types of par- 
tial differential equations. 

However, there is another approach to diffusions that is in many ways more 
natural than the generator approach. We will develop a ‘stochastic calculus’, 
and construct diffusions as solutions of ‘stochastic differential equations’. This 
approach is the one that is most closely associated with applications in physics, 
signal processing, and economics, among other fields. We will see that Brown- 
ian motion and other continuous-time martingales play important roles. The 
generator approach will be discussed later in the chapter. 


33.1. Stochastic difference equations 


Let W be a Wiener process, and let a: R — [0,00) and b: R —> R be measurable 
functions. Given an € > 0, consider the equations 


(33.1) Anri = Zne + a(Zne)(Win+1)e oa Wne) + b(Zne)E > 


n = 0,1,2,.... Given an initial value Zp = z, these equations have a unique 
solution, which is a random sequence that can be calculated recursively in terms 


662 33. DIFFUSIONS AND STOCHASTIC CALCULUS 


of the Wiener process W. 

Let us rewrite (33.1) with some notation that emphasizes the fact that it is a 
‘difference equation’. Given an R-valued function f whose domain includes the 
set {0,¢,2e,...}, let 


Aft) = f(t+e)-— f(t), t€ {0,¢,2e,...}. 
Then (33.1) becomes 
(33.2) A-Z = a(Z,)A-W; + 6(Z,)Aet, te {036 Des aah 


We call (33.2) a stochastic difference equation with coefficients a and b. 

Let Z‘) denote the solution of (33.2) with initial value Zo = z. It is easy 
to verify that Z‘©) is a Markov sequence (see Problem 1). Our intention is to 
obtain a diffusion as some sort of limit as € N 0 of Z“). 

As a first step in this direction, we make Z“*) into a C[0,0o)-valued random 
variable by defining 


(33.3) ZO = Z) + a(Z©))(Wi — We) + (Z(t — ne) 


for t € (ne, (n + 1)e),n = 0,1,2,.... We will eventually show (under certain 
conditions on the coefficients) that the collection (Z‘*),e > 0) is Cauchy in 
probability as € N 0. Since C[0, 00) is a Polish space, it follows that there is a 
limit Z. As will be seen, this limit is a diffusion that is the unique solution of a 
‘stochastic differential equation’ related to (33.2). 

We now introduce some more notation. For £ > 0, let ra: [0,00) — [0,00) be 
the approximation of the identity function defined by 


(33.4) r-(t) = elt]. 


Then it is easy to check that Z(©) is the unique solution of 


t t 
(33.5) Ze=zt i: a(Z,.(s)) dW, + J b(Z,.(s)) ds, te (0, 00), 
0 0 


where the first integral in this equation is a Riemann-Stieltjes integral with 
respect to the random continuous function W. This integral exists because the 
function s ~> a(Z,,(s)) is piecewise constant. We regard (33.5) as the ‘stochastic 
integral’ form of the stochastic difference equation (33.2). The diffusions we will 
construct are solutions of equations like (33.5), with 7,(s) replaced by s. 


Problem 1. Show that the solution of (33.1), with initial value z, is a Markov 
sequence with respect to the filtration (Fne+: n = 0,1,2,...), where for each 
t € (0,00), Fe =a (Ws: s < t). 


33.2. THE ITO INTEGRAL 663 


* Problem 2. Let Z“) be the solution of (33.5), with initial value z. Assume that 
the absolute values of coefficients a and b are bounded above by a function of the 
form z ~ cx +d for some finite constants c,d. Show that there exist finite constants 
c and d’ not depending on € such that 


E A | < d'e” 


for all u € [0,00). Hint: First prove the inequality by induction for u = ke, 
kE | Po re ae mee 


33.2. The It6 integral 


In the previous section we encountered an integral with respect to a Wiener 
process. Since the integrand was piecewise constant, we were able to treat it as 
a Riemann-Stieltjes integral. This type of integral is not sufficient for the more 
general integrands that we will be considering, so in this section we develop a 
new type of integral. 

We begin by describing an appropriate class of integrands. Let W be a Wiener 
process defined on a probability space (Q, F, P), and (7:4: t € [0,00)) the 
minimal right-continuous filtration of W. A nonanticipating W -functional is 
a D[0,0o)-valued random variable X = (X;: t € [0,00)) defined on (Q, F, P) 
that is adapted to a filtration of the form (o(7;,,G): t € [0,00)), where G is a 
o-field that is independent of W. Our goal is to define the integral with respect 
to W of every nonanticipating W-functional X. We will accomplish this by tak- 
ing an appropriate limit as € N 0 of Riemann-Stieltjes integrals with respect 
to W of certain piecewise constant nonanticipating W-functionals. We need the 
following result about integrals of such functionals. 


Lemma 1. Let W be a Wiener process, and X a bounded nonanticipating W - 
functional for which there exists a sequence of times 0 = to < ty < te <---> œ 
such that Xt = Xz, for t € (tn, tnir), n = 0,1,2,.... Fort € [0,00), let J; be 
the Riemann-Stieltjes integral 


t 
x= | X dW . 
0 


Then the C[0, 0o)-valued random variable 3 = (J4: t € [0,00)) is a continuous- 
time martingale with respect to the minimal right-continuous filtration of W, and 
for each t, 


E(3;)=0 and Var(3,) = E( f X2 du), 
(0) 


PROOF. To show that J is a continuous-time martingale with respect to the 
given filtration, it is enough to show that 


(33.6) E(f xaw | Fos) = 0 


664 33. DIFFUSIONS AND STOCHASTIC CALCULUS 


for 0 < s < t< œ. Since X is constant on each interval [tn, tn+1), the Riemann- 
Stieltjes integral fi X dW is a sum of terms of the form 


(33.7) / X dW = X,(W, — Wa), 


where v > u > s and the intervals [u,v] for various summands do not overlap 
except possibly at endpoints. Since W is a Wiener process, Fy} and (Wy, — Wu) 
are independent. Since X is a nonanticipating W-functional, X„ is measurable 
with respect to o(Fu4,G) for some o-field G that is independent of W, so X, 
and (W, — Wau) are conditionally independent given ¥,, by Proposition 26 of 
Chapter 21. Thus, the conditional expectation given F, of each term of the 
form (33.7) is 0. The equality (33.6) follows immediately. 

Setting s = 0 in (33.6) and taking expectations of both sides gives E(3;) = 
0 for all t. It is an easy to show that the martingale differences (33.7) are 
uncorrelated for various u and v satisfying the nonoverlapping condition of the 
preceding paragraph. Thus the variance of 3; can be obtained by summing the 
variances of those terms. Using the independence of X, and (W, — Wu), we see 
that the variance of (33.7) is 

(v — u)E(X,?). 


u 


The desired formula for Var(3;) follows. O 


To now define the integral of an arbitrary nonanticipating W-functional X 
with respect to W, we introduce the notation X) for the piecewise constant 
D{0, oo)-valued random variable defined by 


(33.8) X\ =X), t€ (0,00), 


with 7, defined by (33.4), and 3) for the C[0, 00)-valued random variable defined 
by 


t 
ee X“ dW, t€ (0,0), 
0 


where the right side is a Riemann-Stieltjes integral. Note that X(® inherits 
the property of being a nonanticipating W-functional from X, and that if X is 
bounded, then Lemma 1 applies to 3‘), since in that case X(® is bounded and 
piecewise constant. 


Definition and Proposition 2. Let W be a Wiener process on [0, œ), de- 
fined on a probability space (Q, F, P), and let X be a nonanticipating W- 
functional. The Ité integral of X is the C[0,0o)-valued random variable J = 
(J+: t € [0,00)), where 

= lim 3") i.p. 
a JM? i.p 


33.2. THE ITO INTEGRAL 665 


For 0 < s < t < œ, the Ité integral of X from s to t equals 3; — J, and is denoted 


by 
t t 
J Ka ai J Taw 


PROOF. We need to prove that the collection (3%): € > 0) is Cauchy in 
probability as € N 0. Since we are working in the Polish space C[0, 00), it is 
sufficient to prove the following for each t € [0, œ) and ô > 0: 

lim PĪ sup |9%) —3| > 6] =0. 
ENNO Laue i 
First consider the case in which X is bounded. By Lemma 1, J(® is a mar- 


tingale for each e > 0. By the Kolmogorov Inequality of Chapter 24 (which can 
be extended in a straightforward manner to cadlag-valued martingales), 


(£) (7)\2 
EVO? — J 
P| sup ja‘? = J| > ô] < EIO =I] 
s 62 
sE€[0,t] 


By Lemma 1, the numerator of the right side of this inequality equals 


e(f xe - x)? du). 


Since X is assumed to be cadlag-valued, XP and X” converge to X, asé,n N 0 
for A-a.e. u € [0,00), where \ denotes Lebesgue measure. The desired conclusion 
follows from the Bounded Convergence Theorem. 
For general nonanticipating W-functionals X, we use the fact that every R- 
valued cadlag function is bounded on bounded intervals. Therefore 
lim P[|X, = (=n V Xs) An for all s € [0,¢]] =1. 


n= CO 
For n = 1,2,..., let An be the event in this last expression. Then the argument 
in the bounded case implies that 
lim P(A, N [ sup Ge) = GUM > ô) =0 
ENO s€(0,t] 
for all ô > 0 and n = 1,2,.... Since P(A,) 1 as n —> ov, the desired result 
follows. 


Problem 3. Let X be a nonanticipating W-functional and T an a.s.-finite stopping 
with respect to the minimal right-continuous filtration of W. Show that 

= (XT+t — Xr: t € [0,00)) is a nonanticipating W- functional, where W is the 
A process (Writ — Wr: t € [0,00)). 


Problem 4. Let Z{® be the solution of (33.2). Show that if the absolute values of 
the coefficients a and b are each bounded above by some polynomial, then Z(® is 
a square-integrable nonanticipating W-functional. 


666 


x 


33. DIFFUSIONS AND STOCHASTIC CALCULUS 


Problem 5. If X is a nonanticipating W -functional such that the Riemann-Stieltjes 
integral of X with respect to W on [0, t] exists with positive probability, we now 
have two interpretations of the expression f X dW. Are these interpretations in 
agreement? 


Problem 6. Show that the It6 integral is linear in the sense that if X and Y are 
nonanticipating W-functionals, then 


t t t 
f ex+arjaw =a xaw + | Y dW 


for all times 0 < s < t < œ and real constants a and £. 


Problem 7. Let f: [0,œ) — R be a (nonrandom) cadlag function. Show that 
the It6 integral J of f is a Gaussian process, which is to say that for all times 
ti,...,ta € [0,00), (3t,,-..,3¢,) is normally distributed. Also show that J has 
mean function 0 and covariance function 


eas | Pu) du. 


We have chosen to define the It6 integral J as a limit in probability of C[0, oo)- 
valued random variables 3), thereby simultaneously defining the random vari- 
ables J+, t € [0,00) in such a way that 3 is C[0, 0o)-valued. One consequence of 


this 


definition is that limax o a = J; i.p. for each fixed t. Under an additional 


assumption, we strengthen this latter result with the following lemma. 


Lemma 3. Let X be a nonanticipating W-functional, and define X) for 


E > 


0 by (33.8). If 


(33.9) e(f xX? du) <0, 


then 


(33. 


and 


(33. 


10) 


t t > 
im £([ | (nv XE) and, - | (=n V Xu) An dW, | ) =0 
ENO 0 0 


t t 2 
1) lim e([/ (=n V Xi) AndW, = f X,dW,| ) SG. 


PROOF. It follows from the Bounded Convergence Theorem and Lemma 1 
that 


sO 


t t 9 
lim =([/ (nv X) Andi, — | (=n V XP) AndW,| jan 
ENO 0 0 


t 
E~ J (—nv XE) AndW, 
0 


33.2. THE ITO INTEGRAL 667 


is Cauchy in L2(2,F,P) ase N 0. We know from the definition of the Ito 
integral that the random variables in this collection converge in probability to 
fu(-n V Xa) AndW,, as € N 0. Equation (33.10) follows. 

It follows from (33.10), Lemma 1, and the Bounded Convergence Theorem 
that 


B([[ Cav xw anaw- f my x) amaw]’) 
z (f nv x0 ^ n]? du) + E(f mvx) ^m}? du) 


= 26( f =m V Xu) Am] [(—_n V Xu) An] du) 


for m,n = 1,2,.... By the Monotone Convergence Theorem, each of the expec- 
tations on the right side of this equality converges to 


e([ X? du) 


as m,n — oo, so the sequence 


(f (crv xu) AndWy:n=1,2,-.-) 
0 


is Cauchy in Lo (Q, F, P). It was shown at the end of the proof of Proposition 2 
that this sequence converges in probability to J+, so (33.11) follows. O 


A nonanticipating W-functional that satisfies (33.9) for all t € [0, 00) is said 
to be square-integrable. Taken together, Lemma 1 and Lemma 3 immediately 
imply the following important result for square-integrable nonanticipating W- 
functionals. 


Theorem 4. Let J be the Itô integral of a square-integrable nonanticipating 
W -functional X. Then J is a martingale with respect to the minimal right- 
continuous filtration of W, and for all t € [0, œ), 
t 
E(3,)=0 and Var(3;) = E(f G dt) ; 
0 


Corollary 5. Let X, X'™,n=1,2,..., be square-integrable nonanticipating 
W -functionals. Then 


tim 2([f x aw- f xaw]’) =0 


Jim B( foxy — Xu}? du) =0. 


if and only if 


Problem 8. Prove Theorem 4 and its corollary. 


668 33. DIFFUSIONS AND STOCHASTIC CALCULUS 


Problem 9. Let 3 be the Ito integral of a square-integrable nonanticipating W- 
functional X. Show that 3 is a square-integrable nonanticipating W-functional. 


Example 1. Since W is itself a nonanticipating W-functional, it has an It6 
integral, which we now calculate. Letting € = t/n in the definition, we have 


t n—1 
f W dW = lim X` Wis (Westie — Wee) 
0 k> 00 k—0 m n n 
. 1 < 2 2 2 
= lim 5 X [Wé — i) — (Warn: — Wu) | 
k=0 


= EE tim D Way = Wy)? ip 
=0 


The terms in the last sum are iid, with mean t/n and variance 2(t/n)?, so the 
sum itself has mean ¢ and variance 2t7/n. It follows that the sum converges to 
tin L2(Q,F,P), and hence in probability, as n + oo. We have shown that 


t 2 
W t 
f W dw = — -Ż. 
0 2 2 
The second term on the right side of the formula just obtained is perhaps 
unexpected. If W were Riemann-Stieltjes integrable with respect to itself, that 
term would not be present. On the other hand, if that term were not present, the 


Ito integral of W would not be a continuous-time martingale, in contradiction 
to Theorem 4. 


Problem 10. Let X be a nonanticipating W-functional. Show that 
k-1 


t 
OD ie [Woran -w= | X, du i.p. 


for t € [0, co). 


33.3. Stochastic differentials and the It6 Lemma 


The example at the end of the preceding section shows that the ordinary rules 
of calculus do not apply to It6 integrals. In this section, we will introduce some 
of the new rules of the stochastic calculus. 

Suppose that Z = (Z;: t € [0, œ)) satisfies 


t t 
(33.12) Ži — Zo = | Xu dW, a Y,du, t €[0,cc), 
0 0 


33.3. STOCHASTIC DIFFERENTIALS AND THE ITO LEMMA 669 


where X and Y are nonanticipating W-functionals. The C[0, oo)-valued random 
variable Z is called a stochastic integral. Notice that stochastic integrals are 
themselves nonanticipating W-functionals. 

We now introduce a more compact way of writing (33.12): 


(33.13) dZ = XdW +Y dt. 


For example, the result of our calculation in Example 1 of i W dW is written 
compactly as 
d(W°) = 2W dW + dt. 


The quantities dZ and dW in this expression are called stochastic differentials, 
and the expression itself is called a stochastic differential equation. The quantity 
dt is a differential, although under certain circumstances we will also call it a 
stochastic differential, just as we sometimes use the term ‘random variable’ to 
describe a nonrandom constant. 

It is important to realize that (33.13) is simply a compact way of writing 
(33.12). There is no difference in meaning between the two. One reason for 
using the more compact form is that it naturally leads to an interpretation of 
integration with respect to Z. If X,Y, and Z satisfy (33.13), and if U is a 
nonanticipating W-functional, then the stochastic integral 


t t 
[ XUW + | YUdu, t € [0, oo), 
0 s 


is denoted by 
f vaz, t € [0, œ); 
or more compactly, we write l 
U dZ = XUdW+YU dt. 


Another reason for using (33.13) to denote stochastic integrals is that the 
rules of stochastic calculus are easier to remember in that form. The following 
lemma provides the basis for deriving such rules. 


Lemma 6. [Itô] Suppose X,Y™,Z™ are nonanticipating W -functionals 
satisfying 
daZ® = XO qW+Y%dt, i=1,...,n. 
Set 
Z = (t, Z®,..., Z), t€[0,00), 
and let f: [0,œ) x R” —> R be a function with continuous partial derivatives 


Dif,i = 0,...,n, and continuous second-order partial derivatives Dijf, i,j = 
1,...,n. Then 


n n 
d(f o Z) = Dof o Zdt + X` Dif oZdZ +5 S XOXO Dif oZ dt. 


i=1 i, j=1 


670 33. DIFFUSIONS AND STOCHASTIC CALCULUS 


PROOF. The first part of the proof is a series of reductions to increasingly 
simple cases of the theorem. The reader is requested to verify the validity of 
these reductions in Problem 11. An argument based on the definitions implies 
that it is sufficient to prove the theorem for the case in which there exists a 
sequence of times 0 = to < tı < to < ... increasing to œ such that each X () and 
Y™ is constant on intervals of the form [tn, tn+1). By considering separately the 
integrals over each such interval, it can be shown that it is sufficient to prove the 
theorem for random constant functions X and Y™. Since a nonanticipating 
random constant W-functional is necessarily independent of W, the theorem 
can be reduced to the case in which X and Y) are nonrandom constants by 
conditioning. A further straightforward argument shows that this latter case is 
equivalent to the case in which n = 1, X(® = 1, and Y“) = 0. A limiting 
argument shows that it is enough to consider bounded functions f such that 
Dof, Dif, and D,,f are also bounded, and a further limiting argument allows 
us to assume that the second partial derivatives Doof and Do, f exist and are 
continuous and bounded. 

Thus, it remains to show that if f: [0,co) x R — R is bounded and has 
bounded continuous first- and second-order partial derivatives, then 


t 
= f (Dof (u, Wu) + Dir f(u, Wu)) du f D: flu, Wa) dW, 
0 


(33.14) 
0 


for all t € [0, 00). For each positive integer k, the left side of (33.14) equals 


k-1 
DU Was) — FG Wad] 
j=0 


By the Taylor Theorem with remainder for functions of 2 variables, we may write 
the terms in this sum as 


25 1Duf (€, Wi) [Warne = Wal” + Ej, k), 


where the absolute value of the error term E(j,k) is bounded above by a positive 
random variable C(k) times 


(33.15) ta + [Worn — Wal) + (Wosn: — Wa) ; 


with C(k) > 0 a.s. as k > œ. 

By the definition of the Riemann and Itô integrals and Problem 10, the first 
three terms in the Taylor approximation converge i.p. to the right side of (33.14). 
So it is enough to show that Da, E(j,k) > 0 i.p. as k > oo. By the uniform 
continuity of W on [0,t], the sum over j of the first term in (33.15) is almost 


surely bounded above by some constant as k — oo. In Example 1, we showed 


33.3. STOCHASTIC DIFFERENTIALS AND THE ITO LEMMA 671 


that the sum over j of the second term converges to t in probability. The desired 
result follows from the fact that C(k) > 0 i.p. as k> œ. O 


Problem 11. Justify the series of reductions in the first paragraph of the proof of 
the It6 Lemma. 


* Problem 12. Use the It6 Lemma to calculate d(foW) for f(z) = z”, p=1,2,..., 
and f(w) = e°”, sinaw, cosaw, a ER. 


Problem 13. Use your answer to the previous problem to find a solution of the 
stochastic differential equation dZ = aZdW, where a is a real constant. Hint: 
The solution takes the form t ~> fi (t) fo(W:). 


Example 2. [Brownian local-time process] Let us calculate d(|W]|). The It6 
Lemma does not apply directly, since the absolute value function is not differen- 
tiable at 0, so we will need a limiting argument. 

For ôe > 0, let fse: R — R be the unique function determined by the 
following conditions: (i) fs,-(0) = f5 (0) = 0; (ii) fj.(z) = rE for |z| < 4; (iii) 

s'e(x) = 0 for |z| > 8+ e; (iv) fse is linear on [~d — £, —ô] and [6,6 + €]. Thus, 
fs < approximates the absolute value function from below, and has continuous 
first and second derivatives, with the second derivative being piecewise linear. As 
€ N 0, fs,- converges uniformly to a function fẹ with continuous piecewise linear 
first derivative, and as 6 N 0, fs converges uniformly to the function z ~ |z]. 

We now apply the Itô Lemma to fs =. After a slight rearrangement, we have 


t l 1 t 
PAW = | fse (Wa)dWa = 5 / fil (Wy) du. 
0 0) 


Letting € N 0 and applying Corollary 5 to the left side and the Bounded Con- 
vergence Theorem to the right side, we obtain 


t t 
(33.16) fal) - | HOW aW. = 5 f Iaa Wa) du. 


We would like to use Corollary 5 to take the limit as 6 \, 0 of the left side of 
(33.16). The limit of the left side would be 


t 
(33.17) |W; | - j sgn(W,,) dW, , 
0 


where sgn(z) = I(o0,00)(£) — [(—00,0) (x). Unfortunately sgnoW does not have 
cadlag paths, so its It6 integral is not defined according to the definition we have 
given. In a more thorough treatment, we would extend the definition to accom- 
modate such integrands. But in this example, we content ourselves with noting 
that it is not hard to see that the relevant arguments of the previous section 
(particularly the proof of Corollary 5 and the use of the Kolmogorov Inequality 
in the construction of the Itô integral) can be extended to this situation to show 


672 33. DIFFUSIONS AND STOCHASTIC CALCULUS 


that as ô N 0 the Ito integral of f;oW converges i.p. to a C[0, oc)-valued random 
variable. It is natural to denote the limit by the integral that appears in (33.17). 
Thus we have 
d(|W|) =sgnoW dW +dL, 


where L = (L: t € [0,00)) is the C[0,00)-valued random variable defined by 
letting Lz equal the i.p. limit as 6 N 0 of the right side of (33.16). We call Z the 
Brownian local-time process at 0. 

More generally, we may imitate the argument just given to calculate d(|W — z|) 
for each fixed x € R. As a result, a C[0,0o)-valued random variable L(x) = 
(Li (x): t € [0, 00)) is constructed, with 


t t 
heen ah 
|W; z| a | sgn(Wy F T) dW, = L(x) = He z) li-6+z,5+z](Wu) du i.p. 


This random variable is known as the Brownian local-time process at x. It can 


be shown that the random function z ~ L(x) is a.s. continuous, and that for all 
Borel sets B, 


(33.18) [ Lais J PUREA 


Thus, the Brownian local-time process is the density with respect to Lebesgue 
measure of a random measure on R that gives the ‘occupation time’ of Brownian 
motion. 


Problem 14. Calculate d(W*). Your answer should involve Brownian local-time 
process and an It6 integral whose integrand is not cadlag-valued, as in Example 2. 


* Problem 15. Prove (33.18). 


33.4. Autonomous stochastic differential equations 
In this section we consider the problem 
(33.19) dZ=acZdW+boZdt, Zo=2, 


where z is an arbitrary real number and a and b are functions from R to R 
that satisfy certain other conditions. The requirement Zo = z in (33.19) is the 
initial condition, z itself is the initial value, and the equation is an autonomous 
stochastic differential equation. The adjective ‘autonomous’ refers to the fact 
that W and t appear in it only as differentials; in particular, the coefficients a 
and b do not depend on W or t. 

We restrict our attention to autonomous equations since we are interested 
in constructing diffusions, and because of a lack of time-homogeneity, nonau- 
tonomous equations typically do not have solutions that are diffusions. When 


33.4. AUTONOMOUS STOCHASTIC DIFFERENTIAL EQUATIONS 673 


the additional conditions that we place on a and 6 hold, we will show that (33.19) 
has a unique solution Z that is a diffusion. 


Example 3. If a and b are constants, then Z = t ~~ aW; + bt + z satisfies 
(33.19). Note that the solution Z“) of the stochastic difference equation (33.2) is 
of this form on each of the intervals [ne, (n + 1)e). Our construction of solutions 
of (33.19) for more general coefficients will imply that such solutions are in a 
sense ‘locally’ of this form (see also Problem 18). For this reason, the function 
a? is known as the diffusion coefficient and the function b is known as the drift 
coefficient. 


Our method of finding a solution of (33.19) is to take a limit as € N 0 of 
the solution Z) of the stochastic difference equation (33.2). Readers familiar 
with numerical solutions of ordinary differential equations should see that this 
approach is a stochastic version of the ‘Euler method’. Since we will want to 
work in the Polish space C[0, 0c), we will actually take the limit of solutions of 
(33.5), which is the stochastic integral form of (33.2). 

In the following, we use the terminology bounded slope to describe a function 
f: R — R if there exists a finite constant c such that 


f(z) -— fly) 
oy 


<c, «Fy. 


(Such functions are also called ‘uniformly Lipschitz continuous’.) Note that 
under this condition, f is automatically continuous, and 


(33.20) |f) < eļz| +d 


for some finite constant d, so Problem 2 applies to the solutions of stochastic 
difference equations with coefficients that have bounded slope. 


Theorem 7. Leta,b: R > R be functions with bounded slope, z a real num- 
ber, and for e > 0, let Z) be the solution of (33.5). Then the limit 


— y (e) . 
Z ae i.p. 


exists, and Z is the unique solution of (33.19). Furthermore, Z is a square- 
integrable nonanticipating W functional, and 


t 
(33.21) im E( [ VAG Ea du) =0, t€ 0,00). 
j 0 
PARTIAL PROOF. We will show that the limit Z exists and satisfies (33.19), 
from which it also follows that Z is a nonanticipating W-functional. In the 
course of the proof, we will also see that Z is square-integrable and that (33.21) 
holds. The proof of uniqueness is requested in Problem 16. 


674 33. DIFFUSIONS AND STOCHASTIC CALCULUS 


To show the existence of the limit Z, we prove that € ~~ Z“) is Cauchy in 
probability as e N 0. Fix £,ņ > 0 and t € [0,00). From (33.5) we obtain the 
following: 


sip [2 a | 
O<s<t 


< sup 


O<s<t 


sS 
(e) 
[eR wA f WR) — HZ ou) 
Denote the two terms on the right side by A; and B; respectively. We have 


P[ sup |Z) — Z™| > 6] < PLA; > 6/2] + P[B; > 6/2] 


O<s<t 


for ô > 0. 

By Problem 4 and Theorem 4, the It6 integral in the expression for Ay, is a 
continuous-time martingale. By the Kolmogorov Inequality of Chapter 24 and 
the expression for the variance in Theorem 4, 


4 f* : 
Par > 9/2) < sf PUZ yoy) = eZ IP) du. 


Since a has bounded slope, it follows that there exists a finite constant c’ such 
that 


c! 
P[ A; > 6/2] < sf BZ i a A 1”) du. 


Similar reasoning (with the Chebyshev Inequality replacing the Kolmogorov In- 
equality) gives 


for some finite constant c”. 
Thus, in order to show that € ~~ Z(®) is Cauchy in probability, it suffices to 
show that 


lim E[(Z{ - z™)?]=0 
(33.22) E10 
uniformly for t in any bounded interval. 


33.4. AUTONOMOUS STOCHASTIC DIFFERENTIAL EQUATIONS 675 


By definition 


Zar 
oO (n) iO (n) i 
z p fa(Z D) Z) re (Zs) — 2, II du! 
t 
= / (a(Z®)) — a(Z™)] dW, + + fo (zt M)] du 
0 
t 
+ fw sti ihi Tu (2, y) — (Z)] du 
0 


+ S ZP) = AZ naa f ZP) 12%, aa 


(1) ( (u) 


The right side of this last expression is the square of a sum of 6 terms. Re- 
peated use of the elementary inequality 


(33.23) (£ +y}? < 2r? + 2y? 


shows that this squared sum is bounded above by 8 times the sum of the squares 
of the 6 terms. Take expectations of both sides, apply Theorem 4 to those terms 
on the right that contain an Itô integral, apply the Cauchy-Schwarz Inequality 
to the remaining terms on the right, and then use the Fubini Theorem to bring 
the expectation inside the integrals. The resulting inequality is 


L p(z — z0) 
< P - a(Z(P Ë) dutt f EZP) -AZP du 
0 0 
fet (aZ Roy) — a(Z PIP) du te f BZ) = AZE) du 


a B([alZP) -Zy P) dut f EZP) - 42%, )P) du 


r) (u) 
Letting c be the upper bound on the slopes of a and b, we obtain 


t F 
E([Z,? - 2,” )?) < 8e(1 +t) | E([ZP — Z?) du 
0 


t 
(33.24) + 8c(1 + » | E((Z 4) — ZPP) du 


t 
+ 8c(1 + | EAS Z du. 
0 


A straightforward calculation using (33.3) and (Problem 2) shows that (for 
e <1) 


EZ es =g ) < d'ceo ¥ 


676 33. DIFFUSIONS AND STOCHASTIC CALCULUS 


for some finite constants c’,d'. A similar bound applies to the analogous term 
involving 7. Put these bounds into (33.24) to see that there exists a function 
g: [0, co) > [0, 00) that is bounded on bounded intervals such that 


(33.25) a(t) < off “p(u) du + (e+), 


where 
plu) = E((Z) — ZP). 


Since y(0) = 0, it follows from the Gronwall Inequality in the theory of ordinary 
differential equations that if t is restricted to a bounded interval, then there exist 
constants c”, d” such that 


y(t) < d"(et+n)(e°  -1). 


The constants c”,d” depend only on the right endpoint of the bounded interval 
and the constants c’,d'. The right side of this expression converges to 0 uniformly 
for t in bounded intervals as €,7 N 0. So we have proved (33.22), and the desired 
convergence in probability follows. 

Let Z be the limit i.p. of Z) as e N, 0. It follows from (33.22) that Z is 
square-integrable and that (33.21) holds. Since a and b have bounded slope, it 
follows from Corollary 5 that the It6 integral in (33.5) converges i.p. to the Ité 
integral of ao Z as € N 0, and it follows from the Uniform Integrability Criterion 
and Problem 12 of Chapter 8 that the Riemann integral in (33.5) converges to 
the Riemann integral of bo Z. Since the left side of (33.5) converges i.p. to Z: as 
e€ N 0, we may conclude that Z satisfies (33.19), thus completing the proof. O 


Corollary 8. Leta and b have bounded slope. Then the solutions of (33.19) 
with initial values z € R form a strong Markov family with respect to the minimal 
right-continuous filtration of W. 


PROOF. Let T be an a.s.-finite stopping time with respect to (Fip: t € 
[0,00)), the minimal right-continuous filtration of W. It is easy to see that 
Z = (Zr+4: t € [0,00)) satisfies the stochastic differential equation 


dZ =ao ŽdW +boZdt 


with random initial value Zr, where W is the Wiener process (Wri: — Wr: t€ 
[0,co)). It follows from the convergence portion of Theorem 7 that there is a 
measurable function h: R x C[0, œœ) + C[0, œ) such that h(z,W) is the unique 
solution of (33.19). Thus, h(Zr,W) = Z a.s. Since W is independent of FT+ 
and Zr is measurable with respect to Fr+, it follows from Problem 21 and 
Proposition 12, both of Chapter 21, that the conditional distribution of 7 given 
Fr+ is the same as that of the solution of (33.19) with initial value z = Zr. The 
desired conclusion now follows from Problem 12 of Chapter 31. O 


33.4. AUTONOMOUS STOCHASTIC DIFFERENTIAL EQUATIONS 677 


As in the preceding proof, it is often useful to consider solutions of stochas- 
tic differential equations with random initial conditions. It is easy to see that 
Theorem 7 allows us to do so. However, in order to ensure that the solution be 
Markovian, it is important to assume that the initial value be independent of 
the Wiener process W. We will always make such an assumption. 


Problem 16. Prove the uniqueness assertion in Theorem 7. Hint: Suppose Z and 
Z' are both solutions of (33.19). Let y(t) = E ([Z: — Z:]°). Mimic the part of the 
proof of Theorem 7 that leads up to the use of the Gronwall Inequality to show 
that y(t) = 0 for all t. 


* Problem 17. Show that the Markov family of solutions of (33.19) has a Feller 
transition semigroup. 


Problem 18. Let Z be the solution of (33.19), the coefficients a and b being as- 
sumed to have bounded slope. Show that for any a.s.-finite stopping time T with 
respect to the minimal right-continuous filtration of W, the conditional distribution 


of 
Zrtn — Zr — hb(Zr) 


Vh 
given Fr+ converges a.s. as h N 0 to a normal distribution with mean 0 and 
variance a?(Zr). Also show that 


Se Aap — Zr | Fr+) 
T e o a EE d 


] = b(Z S. 
ho h en) a8 
and 
E E E E 
RNO h 


Problem 19. Let f: R > R be a bounded continuous function with bounded con- 
tinuous first and second derivatives f’, f”. Use the result of the preceding problem 
and the Taylor formula with remainder to show that if Z is the solution of (33.19) 
with initial value z, then 


tim EAE AED a8) f"(e) +2) F (2). 


Example 4. [Ornstein-Uhlenbeck processes] Let a and 8 be positive con- 
stants. For arbitrary initial conditions independent of W, solutions of the 
stochastic differential equation 


dZ = adW — BZ dt 


are known as Ornstein-Uhlenbeck processes, with coefficients a and 3. Such pro- 
cesses are sometimes used to model the velocity of a small (but not microscopic) 
particle suspended in a fluid. The stochastic differential a dW represents changes 
in the velocity due to random bombardment by the molecules of the fluid, and 
— BZ dt represents the effect of friction. Problem 22 shows that the position of 


678 33. DIFFUSIONS AND STOCHASTIC CALCULUS 


such a particle will behave like standard Brownian motion if a and ĝ are large 
and approximately equal. 
By the It6 Lemma, 


d(e®*Z) = aet dW. 
Thus 


t 
Dee 7 Awe J e°” dW.. 
0 
If Zo is normally distributed with mean 0 and variance o7, then it follows from 
Problem 7 that Z is a Gaussian process with mean function 0 and covariance 
function 


2 
(s,t) ~ o2eF(st8) 4 =o (eel — eB(s+#)) | 


If o? = (a? /28), then the covariance function is 


(s,t)~ ore Fls—#l 


which is a function of |s — t|. Thus, for this choice of o°, Z is a ‘stationary 
Gaussian process’ (the reader may provide a formal definition of this phrase), so 
Z is called a stationary Ornstein-Uhlenbeck process. In particular, the normal 
distribution with mean 0 and variance (a?/2) is an equilibrium distribution 
for the transition semigroup associated with the Markov family of Ornstein- 
Uhlenbeck processes with coefficients a and £. 


Problem 20. Determine the relationship between Ornstein-Uhlenbeck processes 
and Ornstein-Uhlenbeck sequences which are defined in Chapter 28. 


Problem 21. Let Z be an Ornstein-Uhlenbeck process with coefficients a and 8 
and (random or nonrandom) initial state Zo. Find the limiting distribution of Z: 


as t > œ. 


Problem 22. Let Z be a stationary Ornstein-Uhlenbeck process with coefficients 
a and 3, and define Y = (¥;: t € [0, 00)) by 


t 
nE Zau du. 
0 


Show that if a > oo and 5 > 1, then the distribution of Y converges to that of 
standard Brownian motion. 


33.5. GENERATORS AND THE DIRICHLET PROBLEM 679 


33.5. Generators and the Dirichlet problem 


Problem 19 constitutes a calculation of the generator (at least on a portion of its 
domain) of the Markov family described in Corollary 8. There is an alternative 
method for calculating G using the It6 Lemma. 


Example 5. Let Z be the solution of (33.19) with initial value z, the coeffi- 
cients a and b being assumed to have bounded slope. Inspired by Problem 19, 
we set Gf = ta’ f" +bf' for functions f: R —> R having continuous first and 
second derivatives, with the hope of proving that G represents the generator. By 
(33.19), 


(33.26) dZ=acZdW+boZdt. 

Using this relation together with the It6 Lemma, we obtain 

(33.27) d(foZ)=(f'oZ)dZ+i(f"0Z)(a0Z)* dt. 

From (33.26) and (33.27), we obtain 
d(foZ)=(f'oZ)(acZ)dW+Gf oZdt. 

In integral form we have 


(33.28) 
t t 
iZ) -f GHZ) du = f(2) + | f'(Zy)a(Z,,) dW,,, t € (0,00). 


Since f’ is bounded and a has bounded slope, | f’(x)a(x)| < clz|+d for some finite 
constants c,d. It follows from Theorem 7 that (f'o Z) (ao Z) is square-integrable. 
By Theorem 4, the right side of (33.28) is a continuous-time martingale. 

Thus, we have shown that Z is a solution to the martingale problem for 
(G,%), where § is the collection of all bounded continuous functions f: R —> 
R with bounded and continuous first and second derivatives. It follows that 
the generator of the Markov family of solutions of (33.19) agrees with G on 
&. Note that this example is an improvement on Problem 19, because we have 
explicitly identified the continuous-time martingale in the martingale problem 
as an Itô integral. Also note that our proof shows that (33.28) holds without the 
assumption that f, f’, or f” be bounded, as long as they are continuous. 


Let G be the generator of a Markov family of diffusions with coefficients a 
and b having bounded slope, as calculated in the preceding example, and let 
(21,22) be a bounded open interval in R. Also let fı, fg be two real numbers. 
The Dirichlet problem for G on (21,22), with boundary values fı, fo, is to find a 
continuous function h: [z1, z2] > R with bounded continuous first- and second- 
order derivatives on (z1, £2) such that 


(33.29) Gh(z)=0, xE (2,22) and hlt) = Ji ts l2: 


680 33. DIFFUSIONS AND STOCHASTIC CALCULUS 


It is known from the theory of ordinary differential equations that the Dirichlet 
problem has a solution if a(x) > 0 for x € [x,,22]. The differential equation 
Gu = 0 is ‘linear’ in u, and there are standard nonprobabilistic methods for 
obtaining an explicit formula for the solution of linear ‘boundary-value problems’ 
like the Dirichlet problem. 

Our goal in the remainder of this section is to use the methods of this chapter 
to find a probabilistic formula for the solution of the Dirichlet problem. We will 
also find probabilistic formulas for the solutions of other related boundary-value 
problems. To save space, we will use the fact that solutions of linear boundary- 
value problems exist to derive these formulas, even though it is possible to use 
probabilistic arguments to prove this fact directly. 

For each z € R let Z‘) denote the unique solution of the stochastic differential 
equation (33.19), with coefficients a, b having bounded slope, and initial value z. 
Let 


LS tt 0s Io) = 2X) Or £2}. 


Thus, for x € [21,22], Tz is the first exit time of the interval (z,,z2) for the 
process Z‘*). Note that T, is a stopping time with respect to the minimal 
filtration of the Wiener process used to construct Z7. 


Theorem 9. Suppose a(x) > 0 for x € [%1, £2], and let g: [z1, 22] > R be the 
unique continuous function satisfying Gg(x) = —1, x € (41,72), with boundary 
values g(x1) = g(@2) = 0. Then E(T,) = g(x), x € [z1, £2]. 


PROOF. By (33.28) 


t t 
g( Zi) - | Go(Z) du = g(z) + | g! (Z)a(Z) dWy, t€ (0,00). 
0 (0) 


(In order for this equation to hold for all t, we should extend g to be defined 
on all of R. However, we are only interested in t < Tz, so such an extension is 
irrelevant.) For u < Tz, Gg(Z®) = —1. Thus 


(33.30) 
T^t 
A AEE (Te At) = g(a) + f g' (Z)a(Z) dW,, t€ [0,00). 
0 


The right side has expected value g(x) by Theorem 4. Letting t > oo and 
taking expected values on the left side, we see from the Monotone Convergence 
Theorem that E(T,) < œ, so T; < œ a.s. Hence, ae > a a.s. as t + oo. 
The desired result now follows from the Bounded Convergence Theorem and the 
fact that the boundary values of g are 0. O 


Using similar methods, we can also obtain a formula for E(T2). The proof is 
requested of the reader in Problem 24. 


33.5. GENERATORS AND THE DIRICHLET PROBLEM 681 


Theorem 10. Suppose a(x) > 0 for x € [x1, £2], let g be the function de- 
fined in Theorem 9, and y: [r1,22| > R the unique continuous function, with 
bounded continuous first and second derivatives on (11,22), satisfying Gy(z) = 
—(g(x)a(x))*, x € (@1,22), with boundary values y(xı) = y(z2) = 0. Then 
E(T?) = y(x) — g(x)”, z € [21,29]. 


Now that we know that Ty is a.s.-finite, the following formula for the solution 
of the Dirichlet problem is easy to derive from the fact that Z‘*) is a solution to 
the martingale problem for (G, §). 


Theorem 11. Suppose that a(x) > 0 for x € [21,22], and let h: [21,22] > R 
be the unique solution of the Dirichlet problem on (21,22) for G with boundary 
values fi, fo. Then 


h(x) = APIE = zı] + fPIZE = gə], x E[r, z2]. 


Corollary 12. Suppose a(x) > 0 for x € |z1, £2], and let h: [£1, £2] + R be 
any nonconstant continuous function that satisfies Gh(x) = 0 for all x € [x1, 29]. 
Then for all x € [x1, 22], 


(z) _ 1 h(x) E h{x2) 
Plér, = 11 = Fy) — Bea) 


We can obtain an explicit formula for a suitable function A in the preceding 
corollary. Let 


= i 2b) dy 
(33.31) BC) Saves Pa 


where uo is some constant. It is easily checked that 


T 
h(x) =} s(u) du 
To 
satisfies the conditions of the corollary for arbitrary intervals [z1, £2] on which a 
is positive, and for arbitrary values of zo, uo € [21, £2]. 

Note that A is strictly increasing, and hence invertible. For y = h(x), x € 
[21,22], let y(Y) = ZŒ) and write Sy = Tz, yı = h(a), and y2 = h(x2). Note 
that S, is the first exit time of the interval (yi, y2) by the process Y, Trivially, 
Corollary 12 implies that 
(33.32) PY = y] = IZ E [yi y2]. 

4 Yı — Y2 


In Problem 26, the reader is asked to show that the processes Y”), y € [y1, y2], 
are diffusions that all satisfy the same differential equation. We express the fact 
that this collection of diffusions satisfies (33.32) by saying that it is on its natural 
scale. Since h transforms the collection (Z") : x € [z1, £2]) into a collection that 
is on its natural scale, we call h the scale function of the diffusions Z ©). 


682 33. DIFFUSIONS AND STOCHASTIC CALCULUS 


Problem 23. Let f: R — R have bounded continuous second derivative, but do 
not assume that either f or f’ is bounded. Let a and b be diffusion and drift 
coefficients with bounded slope, and assume that a is bounded. Define Gf(xr) = 
ta’ (x) f” (x) + b(z)f'(z). Show that the left side of (33.28) is a continuous-time 
martingale. 


Problem 24. Prove Theorem 10. Hint: First use (33.30) to show that 


Tr 
B(T2) = (a)? + E( | ` (g'at)? du). 
0 
Then carry out an argument similar to the proof of Theorem 9. 
Problem 25. Prove Theorem 11 and its corollary. 


Problem 26. Prove that the collection of processes Y ®?, y € [y1, y2], defined above 
are all solutions of the stochastic differential equation 


dY =(h' oh oY)(aoh 'oY)dw. 


Problem 27. Find explicit formulas for E(T,), E(T?), and P[Z;) = xij, i = 1,2, 
for the case in which a > 0 and b are constants, the so-called ‘scaled’ Wiener 
process with ‘drift’. (You will need to be able to solve some simple linear ordinary 
differential equations with constant coefficients.) 


Problem 28. Find a formula in terms of a normal distribution function for Pia = 
xi], i = 1,2, for the Markov family of Ornstein-Uhlenbeck processes with coeffi- 
cients a, 8. 


33.6. Diffusions in higher dimensions 


Most of what we have done in this chapter generalizes easily to dimensions d > 2. 
The starting point of such a generalization is the d-dimensional Wiener process, 
which is the C([0, 00), Rt )-valued random variable W defined by 


W = (WY)... , WM), 


where the W, i = 1,...,d are independent (1-dimensional) Wiener processes. 

By mimicking the development given in this chapter, it is straightforward to 
give meaning to integration with respect to W, with results like Theorem 4 and 
the It6 Lemma having natural generalizations. This d-dimensional ‘Ito integral’ 
gives meaning to stochastic differential equations of the form 


dZ = a(Z)dW + b(Z)at, 


where for each x € R, a(x) is a d x d matrix and b(z) is a member of Rê. 
Using a suitable definition of ‘bounded slope’, the arguments we used in the 

proof of Theorem 7 can be used with very little modification to show the existence 

and uniqueness of solutions of stochastic differential equations whose coefficients 


33.6. DIFFUSIONS IN HIGHER DIMENSIONS 683 


have bounded slope. For a given pair of coefficients a,b, the collection of solu- 
tions with initial states x € R? forms a Markov family whose generator G takes 
the form 


d d 
(3333) Gf) = 5 D ay@)Dy fla) + Dba) Disa), 
1,j=1 i=1 


where a;;(x) is the ij-entry of the symmetric d x d matrix af (x)a(z) and };(z) 
is the it entry of the vector b(x). This formula is valid for bounded functions f 
having bounded continuous first- and second-order partial derivatives D; f, Dj; f. 

We have already seen that diffusions can be used to solve the Dirichlet problem 
in 1 dimension. The generalization of this fact to higher dimensions has great 
importance in the field of partial differential equations. Let U be a bounded 
connected open subset of Rt, with boundary U. Given an operator G of the 
form (33.33) and a continuous function f: U > R, the Dirichlet problem for G 
on U with boundary condition f is to find a continuous function h: U UOU > R 
such that 


Gh(z)=0, «rev, and h(x) = f(x), eo. 


Because the set U can have a complicated shape when d > 2, this problem is 
considerably more difficult in higher dimensions than in 1 dimension. Neverthe- 
less, it can be shown that when a solution exists, it takes the form 


(33.34) h(x) = E(f(Ze’)), 


where T, is the hitting time of OU by Z'*), the solution of the stochastic differ- 
ential equation with coefficients a, b, and initial value z. Even when no solution 
exists, as can happen when U or f are not sufficiently nice, it is still useful to 
regard (33.34) as a ‘generalized solution’ of the Dirichlet problem. 


* Problem 29. What is the generator of the d-dimensional Wiener process? 


Problem 30. [A special case of the It6 Lemma] Let f: R? > R be a continu- 
ous function with continuous first- and second-order partial derivatives D; f, Di; f, 
i,j =1,...,d. Appropriately interpret and prove the following formula: 


d(foW)=VfowW-dW+iAfowdt, 


where ‘°’ denotes the usual Euclidean inner product, V denotes the gradient oper- 
ator, and A denotes the Laplacian operator. 


PART 7 
Appendices 


686 PART 7. APPENDICES 


Appendix A includes short descriptions of symbols and usage of terms relevant 
for reading the text. It focuses more on notational conventions and concepts 
from areas of mathematics different from probability than on the mathematics 
introduced in this book. 

Appendices B-E contain introductions to some non-probabilistic topics that 
are important for certain sections of the book. Portions of Appendices B and C 
on metric spaces and topology are relevant for the beginning chapters. 

Appendix F contains a list of books which one might want to read either 
concurrently with or subsequently to reading this book. References relevant 
to some specific propositions or theorems in the text are given in Appendix G 
along with some additional comments. The rule we have tried to follow for the 
inclusion of references in this section is: include a reference if the result is not 
yet part of the general body of knowledge that has appeared in other textbooks 
and we feel it is not known by most probabilists. We apologize for any oversights 
in this connection. 


APPENDIX A 
Notation and Usage of Terms 


The first section of this appendix defines a variety of symbols used in the text, 
focusing on those related to prerequisite mathematical knowledge. The second 
section specifies how certain terms will be used, especially those which are used 
in various ways by different authors. A few exercises connected with notational 
issues constitute the third section. 


A.1. Symbols 


df indicates that the expression to the left is being defined as the expression on 
the right, and is only used when required for clarity. 


Ac = {r:r A} 

A\ B={r:r2r€ ANB} 
AAXB=(A\B)U(B\A) 

denotes the emptyset. 

AC B means that A is a subset of B. 

AC B means that A # Band ACB. 

A D B means that A is a superset of B. 
ADB means that A #4 B and AD B. 
A—y={r-y: zE A} 

AV B denotes the conver hull of A and B. 
A+B={r+y:xr€A,y€ B}, the Minkowski sum of A and B. 


R = (—o0, oo) 
R = [—o0, 00] 
Rt = (0, 00) 


rR = [0, œœ] 


688 A. NOTATION AND USAGE OF TERMS 


{...,—2,—-1,0,1,2,3,...} 

= {-o0,...,-2, -1,0,1,2,3,..., 00} 
TEA 

=E E 12 8 25366) 


#A denotes the cardinality of the set A. 
IJ 


denotes the length of the interval J. 


C denotes the set of complex numbers. See Appendix E. 


|z| denotes the absolute value of the complex number z, that is, the distance in 
the complex plane between z and 0. 


R(z) denotes the real part of the complex number z. 


T(z) denotes the imaginary part of the complex number z. 


Rê denotes d-dimensional Euclidean space. 

e;, for i < d, denotes the it" member of the standard basis for R?. 

(x,y) denotes the inner product (that is, dot product or scalar product) of x and 
y in RÊ. 

|z| denotes the norm (that is, distance from origin) of z € Rf, which reduces to 
the absolute value of z if d = 1. 

BT denotes the transpose of the matrix B. 


A vector z € R? can be viewed as a row matrix. Thus x? denotes the corre- 
sponding column matrix. 


|| - || denotes norm in a variety of spaces. 
arg z denotes the polar coordinates angle of a point z € R°. 


(a)} denotes the rising factorial Ia +k), aE R, beZt. 
(a); denotes the falling factorial [leno (a — k), a E€ R, bE Zt. 


the binomial and multinomial coefficients: 


n n! 
= ———— for n=ri+rə+:--+ra4 
Tı T2 Td rı! ro! T Tq! 


f: A — B describes f to be a function with domain A and target B. 


z ~ f(x) is another name for the function f. Thus z ~ x” denotes the squaring 
function. 


A.1. SYMBOLS 689 


f og denotes the composition of the functions f and g, but see Appendix E. 
F(C) = {x: f(x) € C}. 


- is used as a place holder for the domain variable in a function. Thus if f is 
a function of two variables, f(-,y) denotes the function of the first variable ob- 
tained by fixing y to be the value of the second variable. 


a.s. is an abbreviation for almost surely. 
a.e. is an abbreviation for almost everywhere. 
i.p. is an abbreviation for in probability. 


iid is an abbreviation for independent, identically distributed. 


[A] denotes the equivalence class to which A belongs. Typical uses: [A] for the 
collection of events equal to A almost surely and [X] for the collection of random 
variables equal to X almost surely. 


a ^ b denotes the smaller of a and b. 

a V b denotes the larger of a and b. 

inf A, the infimum of the set A C R, denotes the greatest lower bound of A, 
with inf @ = oo. 

sup A, the supremum of the set A C R, denotes the least upper bound of A, with 
sup = —oo. Occasionally the universal set will be R`, in which case sup ĝ = 0. 


min A, the minimum of the set A C R, denotes the smallest member of A, and 
thus entails the assertion that A has a smallest member. 


max A, the mazimum of the set A C R, denotes the largest member of A, and 
thus entails the assertion that A has a largest member. 


at=aV0 
a” = (-a) V0 


|a], the floor of a is the largest integer no larger than a. 


[a], the ceiling of a is the smallest integer no smaller than a. 


(x—) = limy vz f(y) 

(c+) = limy\z f(y) 

(oo) = limy f(y), unless œ is a member of the domain of f in which case 
(co) has a direct meaning. 


f(—-œ) = limy- f(y), unless —oo is a member of the domain of f in which 
case f(—oo) has a direct meaning. 


690 A. NOTATION AND USAGE OF TERMS 


O(f(x)) as z + a € R means that lim sup ete < oo as  — a through the 
domain of O(f(z)). 


o(f(z)) as x —> a € R means that ten — 0 as x > a through the domain of 
o( f(z). 


Ia denotes the indicator function of the set A. 

exp denotes the natural exponential function. Thus exp(z) = e”. Also, see Ap- 
pendix E. 

log denotes the natural logarithmic function, the inverse function of exp. Also, 
see Appendix E. 


C[a,b] and Cla, oo) denote the spaces of continuous R-valued functions on the 
intervals [a,b] and [a, oo), respectively. 

D{a,b] and Dla, oo) denote the spaces of right-continuous R-valued functions 
having left limits (that is cadlag functions) on the intervals [a,b] and [a, oo), 
respectively. 

D* [a,b] and Dt{a,oo) denote the spaces of increasing right-continuous R’- 
valued functions on the intervals [a,b] and [a, oo), respectively. 

D([0, 0c), ¥) denotes the space of right-continuous W-valued functions having 
left limits (that is cadlag functions) on the interval [0, co). 


E or Ep denotes the expectation operator, the subscript P emphasizing that P 
is the underlying probability measure. 

E(X ; B) denotes the expectation of the product of X and the indicator function 
of B. 

Var(X), Cov(X,Y), and Corr(X, Y), respectively, denote the variance of X, the 
covariance of X and Y, and the correlation of X and Y (with a subscript P 
being used, if appropriate, to emphasize that P is the underlying probability 
measure). 


Ap -?, X as n = co means that the sequence (X,,) converges to X in distribu- 
tion. 


a gdF denotes limax o f a gdF, a limit of Riemann-Stieltjes integrals. 


A.2. Usage 
Binary digits is used for what some call bits. 
In =œ las n-— œ has the same meaning as limno £n = l, and is suitable for 


modification for extra meaning: £n “1 as n 7 œ entails the assertion that the 
sequence (x1,22,...) is increasing for all sufficiently large subscripts. Similarly 


A.2. USAGE 691 
In N las n 7 co implies that the sequence is ultimately decreasing. 


The product of 0 and œ is to be understood to equal 0, unless the surrounding 
discussion makes it clear that some other view is appropriate. 


0° = 1, 1° = 1, unless otherwise specified. 
In R’ -setting, é = oo for b > 0 is sometimes used. 


The 0-fold convolution of a distribution is the identity for the convolution oper- 
ation, namely the delta distribution at 0. 


The derivative of order 0 of a function is the function itself. 

A sum of an empty collection of summands is equal 0. Example: yg 0: 
A product of an empty collection of factors is 1. Example: IS kS 

0! = (a)} = (a)5 = 1 fora E€ R. 

The union of an empty collection of sets is the empty set. 


The intersection of an empty collection of sets is the universal set, possibly not 
identified explicitly. 


proper difference of A and B is an appropriate description of A \ B in the case 
that B C A; it is not required that B be different from either A or @. 


That A be a proper subset of B means that A C B, the possibility that A = 0 
not being excluded. 


Aj, Ao,...) is a decreasing sequence of sets if An D An+1 for every n. 
A,, Ao,...) is an increasing sequence of sets if Ay C Anis for every n. 
Aj, Ao,..-) is a monotonic sequence of sets if it is decreasing or increasing. 


( 
( 
( 
(A,, Ao,...) is a strictly decreasing sequence of sets if A, D An+1 for every n. 
(A1, Ao,...) is a strictly increasing sequence of sets if An C An+i for every n. 
( 


A,, A2,...) is a strictly monotonic sequence of sets if it is strictly decreasing or 
strictly increasing. 


The symbols A, B, and C always denote Borel o-fields. Arbitrary o-fields can be 
denoted by symbols such as F and G, whether Borel or not. And letters in this 
style may not even denote o-fields. For instance, € is often used for an arbitrary 
family of sets. 


image of f is the set {y: f(z) = y for some z} 
image of x under f is f(x) 


target of f is any set of which the image of f is asserted to be a subset. 


ft = f V0 is called the positive part of the function f. 
f~ =(-f) V0 is called the negative part of the function f. 


692 A. NOTATION AND USAGE OF TERMS 


f is cadlag if it is right-continuous and has left limits on its domain. 
f is decreasing if f(x) > f(y) whenever z < y. 

f is increasing if f(x) < f(y) whenever z < y. 

f is monotonic if f is decreasing or is increasing. 

f is strictly decreasing if f(x) > f(y) whenever z < y. 

f is strictly increasing if f(x) < f(y) whenever z < y. 


f is strictly monotonic if f is strictly decreasing or is strictly increasing. 


An R-valued operator is positive if it assigns nonnegative values to positive ele- 
ments. 

An R-valued operator is strictly positive if it assigns positive values to positive 
elements. 


The term definite is adjoined to the terms positive and strictly positive when 
viewing matrices as operators, this adjective making it clear that one is not 
speaking of the entries of the matrix. 


w sometimes means {w}. 


partition is used in two distinct related ways; check the index. 


A.3. Exercises on subtle distinctions 


The purpose of these exercises is to focus attention on some of the conventions 
described in the preceding section. 


Problem 1. Prove that a function f: R —> R that is strictly increasing on an inter- 
val [a,b] and on an interval [b,c] is strictly increasing on the interval [a, c] 


* Problem 2. Use the preceding problem and a standard calculus theorem to prove 
that the function x ~ x — sin g is strictly increasing on the interval [—27, 27]. 


Problem 3. For a one-to-one function f, the notation f7? has two distinct but 
closely related meanings. Discuss. 


Problem 4. Let a > b > 0. Prove that the sequence ((a)i/(b): coat ee .) is 
strictly increasing. Does your proof show a strict increase from c = 0 to c = 1 or 
is a separate argument needed? Explain. 


APPENDIX B 
Metric Spaces 


Often, measurable sets are Borel sets, that is, members of the smallest o-field 
containing all the open sets. A natural setting for Borel o-fields is that of metric 
spaces, properties of which we review here. Also, some important examples will 
be examined. 


B.1. Definition 


A metric space consists of a set Y and a function p: VW x VW > Rt satisfying the 
following properties: 


(i) p(z,y) = 0 if and only ify = z; 
(ii) p(x, y) = ply, z); 
(iii) p(x, z) < p(z,y) + ply, 2). 


The last two properties are called symmetry and triangle inequality, respectively. 
The function p is called the metric on WY and its value at a particular pair (x,y) 
is the distance between x and y. A metric space, thus defined, is denoted by 
(VY, ), or more briefly by W if there is no ambiguity concerning the metric. 

For z a member of a metric space (WV, p) and € > 0, the sets 


{y: p(z,y) <€}, fy: p(z,y) <e}, and {y: p(z,y) =e} 


are called the open ball, closed ball, and sphere of radius € centered at z. A 
subset B of a metric space with metric p is an open set if for every x € B there 
exists £ > 0 such that the open ball of radius £ centered at x is a subset of B. 
It is easy to use the triangle inequality to prove that every open ball is an open 
set. A set is a closed set if its complement is open. It is easy to prove that all 
spheres and closed balls are closed sets. 

For a set ČC C W, the interior of C is the largest open subset of C; the 
closure of C is its smallest closed superset; and the boundary of C, denoted by 
OC’, consists of those points in its closure that are not also in its interior. It 


694 B. METRIC SPACES 


is possible to prove the existence of the interior and closure of any subset of a 
metric space. 

A subset B of a set C in a metric space is dense in C if every ball centered 
at a point in C contains a member of B. The modifying phrase “in C” is often 
omitted if C is the entire metric space. A metric space is separable if it contains 
a countable dense subset. 

A set in a metric space is bounded if it is contained in some ball. It is totally 
bounded if for every £ > Q, it is contained in the union of a finite collection of 
balls of radius less than €. It is compact if every open cover of it has a finite 
subcovering. (A collection of sets is called a cover of a set C if C is a subset 
of their union; the cover is open if each of the sets in the cover is open.) A set 
is relatively compact if it has compact closure. If any of the adjectives bounded, 
totally bounded, and compact apply to the set of all points in a metric space, 
then the corresponding adjective is also used for the metric space itself. 


* Problem 1. Prove the following facts about sets in any metric space. 
e Finite intersections and arbitrary unions of open sets are open. 
Finite unions and arbitrary intersections of closed sets are closed. 
Finite unions and arbitrary intersections of compact sets are compact. 
Every compact set is closed. 
A closed subset of a compact set is compact. 
The intersection of a collection of compact sets is empty if and only if the 


intersection of some finite subcollection is empty. 
e Any set in a metric space is itself a metric space with the inherited metric. 


Problem 2. Use the first two items in the preceding problem to prove the facts 
mentioned above: every set has an interior, possibly empty; every set has a closure. 


B.2. Sequences 


A sequence (z1,22,...) in a metric space (Y, p) converges to a point x € W if, 
for every £ > 0, there exists an integer p such that p(£n, £x) < € whenever n > p. 
The sequence is Cauchy if, for every € > 0, there exists p such p(£m, £n) < € 
whenever n > m > p. 


Proposition 1. Every convergent sequence in a metric space is Cauchy, and 
every Cauchy sequence which has a convergent subsequence converges. 


Problem 3. Prove the preceding proposition. 


A metric space in which every Cauchy sequence is convergent is said to be 
complete. 


Proposition 2. Every sequence in a totally bounded metric space has a sub- 
sequence that is Cauchy. 


B.3. CONTINUOUS FUNCTIONS 695 
Problem 4. Prove the preceding proposition. 


A set C in a metric space is relatively sequentially compact if every sequence 
in C has a subsequence that converges. In case the subsequence can always be 
chosen so that its limit belongs to C, C is sequentially compact. The proof of 
the following result will be omitted. 


Proposition 3. Compactness is equivalent to sequential compactness; rela- 
tive compactness is equivalent to relative sequential compactness. 


In practice one often proves that a sequence converges by simultaneously 
proving relative sequential compactness and that every convergent subsequence 
has the same limit. The next proposition entails a recipe for doing this. 


Proposition 4. A sequence (£n: n = 1,2,...) converges to a limit y if and 
only if every subsequence of (£n: n = 1,2,...) has a further subsequence that 
converges to y. 


* Problem 5. Prove the preceding proposition. 


B.3. Continuous functions 


A function g from one metric space (Y1, p1) to another metric space (Wo, p2) 
is continuous at x € W, if for every € > O, there exists ô > 0 such that 
p2(g(y),g(x)) < £ for every y satisfying p,(x,y) < ô. Equivalently, g is con- 
tinuous at x if for every sequence (£n: n = 1,2,...) in VW, converging to x, it is 
true that g(£n) > g(x) as n —> œ. The function g is continuous if it is continu- 
ous at each point. It is uniformly continuous if 6 can be chosen to depend only 
on € and g, but not on z. 


Problem 6. Prove that if f: Yi — Woe is a function from one metric space to 
another, then f is continuous if and only if for any open set A C Yə, fT (A) is an 
open subset of Wj. 


Problem 7. Prove that if f is a continuous function from one metric space to an- 
other, then the image under f of any compact set is compact. Hint: Use Problem 6. 


B.4. Important metric spaces 


The function (x,y) ~> |y — z| is a metric for R. Another metric for R is (x,y) ~ 
| arctan y — arctan z|. These two metrics for R make R into two different metric 
spaces. With the first metric, R is complete but not bounded, and with the 
second it is totally bounded but not complete. However, the open sets determined 


696 B. METRIC SPACES 


by the two metrics are easily seen to be identical. Thus these two metrics turn 
IR into the same measurable space. 

The metric (z,y) ~ | arctan y — arctan z| for R can be extended to a metric 
for R by defining arctanoo = 7/2 and arctan(—oo) = —2/2. With this metric 
large finite real numbers are ‘close’ to oo and negative finite real numbers of large 
absolute value are ‘close’ to —oo. The metric space R is complete and compact. 
The function x ~ arctanz from R to [—%, 3] is a continuous bijection with a 
continuous inverse, where the metric for [—4, 5] is (u,v) ~ |u— vl]. 

In R? we let |x| denote the Euclidean distance from the point x € R? to the 
origin. The function (x,y) ~ |x — y| is a metric for R which turns R into a 
complete metric space. 

The standard way of making C[0, 1], the set of continuous functions on the 
interval [0,1], into a metric space is to define the distance between f and g to 


equal max{|f(t) — g(t)|:0<t< 1}. 


Problem 8. Show that the closed ball of radius 1 centered at the 0 function in 
C[0,1] is not totally bounded. Hint: Construct an infinite sequence (fn: n = 
1,2,...) in B such that the distance between fm and fn equals 2 for m Æ n. 


Theorem 5. [Arzela-Ascoli] A subset A of C[0,1] is relatively sequentially 
compact if and only if {f(0): f € A} is a bounded set of real numbers and, for 
every € > 0, there exists 6 > 0 such that | f(x) — f(y)| < € whenever |z — y| < ô 
and f € A. 


We omit the proof of this theorem. Notice that 6 in it is not permitted to 
depend on f. A set A of functions is said to be equicontinuous at a point x if for 
every £ > 0 there exists ô > 0 such that | f(y) — f(x)| < e whenever |y — z| < 6 
and f € A. If ô can be chosen to be independent of x, then the family A is said to 
be uniformly equicontinuous. Thus, the Arzela-Ascoli Theorem can be stated as: 
a subset of C[0, 1] is relatively sequentially compact if and only if it is uniformly 
equicontinuous and the set of its values at 0 is bounded. Moreover, ‘uniformly’ 
need not be mentioned because it is a consequence of equicontinuity at each point 
and the fact that [0,1] is compact. [In fact, some people use ‘equicontinuous’ to 
mean ‘uniformly equicontinuous’. | 


APPENDIX C 
Topological Spaces 


Metric spaces, described in the Appendix B, are examples of a more general 
structure that will be described in this appendix. 


C.1. Concepts 


A topological space is a pair (Q, ©) where Q is a set and O a family of subsets 
of 2 satisfying the following properties: 


(i) O is closed under arbitrary unions; 
(ii) O is closed under finite intersections; 

(iii) 0 € O; 

(iv) NEO. 

The collection O is called a topology on 2. (Properties (iii) and (iv) are redundant 
in view of the standard convention that the union and intersection of an empty 
collection of sets are the empty set and universal set, respectively. The members 
of O are said to be open and their complements are closed. (This use of ‘closed’ 
should not be confused with its use to describe an operation as in (i) and (ii) 
above.) Often one refers to a topological space by mentioning the universal set 
Q, rather than both Q and the topology ©. When doing this care is required, 
since it is possible, as illustrated by Problem 18, for two different topological 
spaces having the same universal set to appear in the same discussion. 

It is easily shown that a set C in a topological space has a largest open 
subset, which is its interior, and a smallest closed subset, which is its closure. 
The boundary of C, denoted by OC, consists of those points in its closure that 
are not also in its interior. 


Problem 1. The solution of some problem in Appendix B contains a proof that 
every metric space is a topological space. Which problem is that? 


Problem 2. Why is it true that, with one exception, every topological space has 
at least two sets having the property of being both open and closed? What is the 
one exception? 


698 C. TOPOLOGICAL SPACES 


Problem 3. Prove that the collection C of closed sets in a topological space has the 
following two properties: 

(i) C is closed under arbitrary intersections; 

(ii) C is closed under finite unions. 


Problem 4. Prove that every set in a topological space has both an interior and a 
closure. 


A neighborhood of a point in a topological space is any set that contains some 
open set of which the point is a member. (Some people place an additional 
condition on a set for it to be a neighborhood of a point—namely, that it itself 
be open.) A topological space is Hausdorff if for any two points x and w in 
the space there exist neighborhoods of x and w that have empty intersection. 
Hausdorff spaces almost always suffice for applications to probability. 


* Problem 5. Prove that a point x belongs to the boundary of a set B if and only if 
every neighborhood of x contains at least one point in B and at least one point in 
BS. 


* Problem 6. Prove that the boundary of a set is also the boundary of its comple- 


ment. 


Problem 7. Prove that every metric space is Hausdorff. 


A subset of a topological space Q is compact if every open cover of it has a 
finite subcovering. It is relatively compact if its closure is compact. In case Q 
itself is compact, the topological space is a compact space. The next two results 
describe connections between compactness and closedness. 


Proposition 1. A closed subset of a compact set in a topological space is a 
compact subset in that topological space. 


Problem 8. Prove the preceding proposition. 


Proposition 2. Every compact set in a Hausdorff space is closed. 


PROOF. For a proof by contradiction suppose that B is a compact set that is 
not closed. Let x € B \ B and let w be any member of B. Since the topological 
space is Hausdorff, there exist neighborhoods of x and w that have empty inter- 
section, neighborhoods that with no loss of generality we may take to be open. 
The complement of the open neighborhood of w is a closed neighborhood of z. 
Therefore, the collection 


{N°: N a closed neighborhood of z} 


C.2. COMPACTIFICATION 699 


is an open covering of B. Because B is compact this covering contains a a finite 
subcovering, say 
(ING GING haces be 

Let O; denote the interior of N;, 1 < i < k. No point in O = M*_,O; is covered 
by the finite subcovering, but O, being the intersection of a finite number of open 
neighborhoods of x, is itself an open neighborhood of z —and, because z € OB, 
O contains a member of B by Problem 5, a member that is not covered by the 
finite subcovering. Therefore, we have arrived at the desired contradiction. O 


The following problem introduces a topological space that cannot be viewed 
as a metric space, no matter how one chooses to specify a metric. 


* Problem 9. Let © consist of all subsets O of R having the property that for every 
x € O there exists € > 0 such that the interval [z,z +£) C O. Prove (R, O) is 
a topological space. For this topology, decide which intervals are open, which are 
closed, and which are compact. 


C.2. Compactification 


Sometimes there are strong reasons for working with a compact topological space 
even when the topological space of interest is not compact. 

Let (Q, ©) be a topological space and adjoin to 2 an additional point—call it 
oo —to obtain a set Q* = QU {oo}. Let O* consist of all members of O and all 
subsets of 2* whose complements in Q* are compact subsets of Q. 


* Problem 10. Prove that (Q*,O*), as defined in the preceding paragraph, is a com- 
pact space. 


The topological space (*, O*) introduced above is called the one-point com- 
pactification of the topological space (Q, ©). 


Example 1. The one-point compactification of the real line R with the usual 
topology has the effect of ‘putting’ negative numbers of large absolute value and 
large positive numbers ‘close’ to the same member, ov, of the compactification. 

A more commonly used compactification of R is its two-point compactification 
R = RU {—00, 00}. The open sets in R are the open sets of R, sets of the form 
[—oo, x) for some zx € R, sets of the form (zx, co] for some x € R, and unions of 
sets of these types. 


Problem 11. Prove that R as just described is a topological space. Also, show that 
this topological space is the one induced by the metric (x, y) ~ | arctan z—arctan y| 
which was introduced in the last section of Appendix B. 


700 C. TOPOLOGICAL SPACES 


C.3. Product topologies 
Let J be an arbitrary index set. For j € J, let (0;,0;) be a topological space. 
Set 
X= fo 
jEJ 
N= il O;: O; open in Oj and O; = 2; for all but finitely many i} 
jeJ 
and O equal to the collection of all unions of members of M. 


Problem 12. Prove that (Q, ©) as just described is a topological space. 


The collection O defined above is called the product topology of the topologies 
O,;, and the topological space (9, ©) is the product of the topological spaces 
We omit the proof of the following important theorem about product spaces. 


Theorem 3. [Tychonoff] A product of compact spaces is a compact space. 


Example 2. By the preceding theorem R~” = apas R is compact; that is, 
the set of all sequences in R is compact. 


C.4. Relative topology 
For (NQ, ©) a topological space and VW C Q, let 
P={ONV:O€EO}. 


Problem 13. Prove that (Y,?) as just defined is a topological space. 


The topological space (W,P) described above is called a topological subspace 
of (Q, ©) and P is the relative topology on Y. The next exercise shows that sets 
can change their topological character when a topology is replaced by its relative 
topology, but the subsequent proposition shows that the compactness property 
is stable under such a replacement. 


* Problem 14. Give an example that shows that a set that is not a member of a 
topology O may be open in a relative topology induced by © on a set WV. Prove, 
however, that this phenomenon cannot happen if Y € O. 


Proposition 4. Let (Q,O) be a topological space and Y CQ. Then a subset 
C of © is compact with respect to the topology O if and only if it is compact with 
respect to the relative topology induced by O on Y. 


C.5. LIMITS AND CONTINUOUS FUNCTIONS 701 


Problem 15. Prove the preceding proposition. 


C.5. Limits and continuous functions 


In Definition 5 and Definition 8 we essentially copy appropriate versions of defi- 
nitions that are standard for the topological space R. 


Definition 5. Let f be a function from a topological space T to a topological 
space 2. The function f is said to be continuous at a point y € T if fT} (N) is 
a neighborhood of y for every neighborhood N of f(y). And f is continuous if 
it is continuous at each point in T. 


Proposition 6. A function from one topological space to another 1s contin- 
uous if and only if the inverse image of every open set is open. 


PROOF. Let f: T — Q. For one direction suppose that the inverse image 
under f every open set in Q is open in Y, and consider an arbitrary y € T 
and an arbitrary neighborhood N of f(y). There exists an open set O C N for 
which f(y) € O. Then f~'(O) contains y, is open, and is a subset of f~'(N). 
Therefore, f~1(N) is a neighborhood of y. Since y is arbitrary, f is continuous. 

For the other direction, suppose that f is continuous and consider an arbitrary 
open set O in N. Since O is a neighborhood of each of its members, f~!(O) is a 
neighborhood of all of its members and thus, for each y in f~1(O), there exists 
an open set Ny such that y € Ny C f~'(O). Therefore f~*(O) = Uye s-1(0) Ny: 
which being the union of open sets is open. O 


Proposition 7. Let f: T > Q be continuous and suppose that Y is compact. 
Then the image of f is compact. 


Problem 16. Prove the preceding theorem. 


Problem 17. Use relative topology to adapt the preceding discussion to the case 
where the domain of f is a subset of T. 


Problem 18. Let f: R — R. Prove that f is right-continuous (as usually defined) 
if and only if it is continuous when the domain has the topology of Problem 9 and 
the target has the usual topology. 


Definition 8. Let f be a function with target a topological space Q and 
domain a subset of a topological space T. Let y € Y and suppose that every 
neighborhood of y contains a point different from y in the domain of f. We say 
that 

lim f(z) =z 


zy 
if for every neighborhood N of x there is a neighborhood M of y such that 
f(z) € N whenever z Æ y is in the intersection of M and the domain of f. 


702 C. TOPOLOGICAL SPACES 


Problem 19. Suppose that f: Y — Q is continuous. Prove that 


lim f(z) = f(y) 


ZY 


for every y € T for which the one-point set {y} is not open. 


Example 3. Make Z* into a topological space by calling every subset open, 
and let Z` denote the one-point compactification of Z*. The compact sets in 
Zt are the finite sets, so that the neighborhoods of œ in Z are those sets 
that contain oo and have finite complements. Consider an arbitrary function 
f: Z* + Q, where Q is any topological space. From the preceding discussion it 
follows that 

lim f (2) Sa 

ZOO 
if and only if for every neighborhood N of x there is a member m of Zt such 
that f(z) € N whenever z > m. Hence, we see that sequential convergence is 
encompassed by Definition 8. 

Comment: Another way to view the topology on Z is that it is the relative 
topology induced by the usual topology on R. 


Proposition 9. Suppose that a sequence (£n: n = 1,2,...) of points in a 
closed set C in a topological space 11 converges to a pointx EQ. Then x eC. 


Problem 20. Prove the preceding proposition. 


Theorem 10. Any sequence in a compact Hausdorff space has a convergent 
subsequence. 


Problem 21. Prove the preceding theorem. 
Problem 22. Let (Q, ©) be the product of topological spaces (0;,O;),7 = 1,2,.... 
For each n = 1,2,..., let 
Wn = (Wn.1,Wn,2; Wn 3; e. ) 
be a point in 2. Show that the sequence (wn: n = 1,2,...) converges in Q if and 


only if the sequence (wn,j: n = 1,2, ...) converges in 2; for each fixed j. 


Problem 23. In the topological space of Problem 9 find an infinite sequence that 
does not converge even though it would converge were the topology the usual 
topology for R. 


APPENDIX D 
Riemann-Stieltjes Integration 


The Riemann integral T f(x) dx is, by definition, the limit of sums of the form 
SG er a 
j=l 


where a = zo < 2%] < +- < Zn = band &; € [z;~-1,2;] for each j. In this appendix 
we replace the differences zj — 2;_1 by g(z;) —g(2j~1) for some function g. This 
procedure leads to a type of integral that lies somewhere between the Riemann 
integral and the Lebesgue integral in generality. One advantage that this integral 
has over the Lebesgue integral is that it satisfies an integration by parts formula 
that can be quite useful for calculational purposes. 


D.1. The Riemann-Stieltjes integral 


The basic setting consists of two functions f and g defined on a closed bounded 
interval [a,b] of the real line. By a point partition of the interval [a,b] we mean a 
finite subset of [a, b] containing both a and b. (In many books a point partition is 
identified by the one-word term ‘partition’, which we use to denote a partition of 
a set.) We typically write the members of a point partition in increasing order. 
Thus, when we say that {zo,21,...,%n} is a point partition of [a,b], it is to be 
understood that 


a = zo < Tı < <L Tn =b. 


To emphasize this point we may use the contrived notation {a = £o < 2,--- < 
Ln = b}, possibly omitting a and b from the notation if the interval on which the 
point partition is based is clear from context. The mesh of the point partition 
{a = zo < 21,°°: < Zn = b} is the maximum of the numbers x;—2;-1,1 <j <n. 

A point partition of the interval [a,b] is said to be a refinement of a second 
point partition if it contains the second point partition as a subset. 


704 D. RIEMANN-STIELTJES INTEGRATION 


A Riemann-Stieltjes sum of f with respect to g corresponding to a point 
partition {£o < T1 <---< £n} of [a,b] is a sum 


3 f(E) [g(z;) — g(aj-1)], 


where €; € [zj;-1,2,;] for each j. Since each é; is only constrained to lie in a 
certain interval, there are typically many Riemann-Stieltjes sums corresponding 
to a particular point partition. 


Definition 1. The function f is Riemann-Stieltjes integrable with respect to 
the function g on the interval [a, b] if there is some number y such that for every 
€ > 0, there is a point partition P of [a,b] for which the difference between 
y and any Riemann-Stieltjes sum of f with respect to g corresponding to any 
refinement of P has absolute value less than e. In case there is such a y, the 
Riemann-Stieltjes integral of f with respect to g on the interval [a,b] is said to 
exist and equal y, and one writes 


b b 
v= | fdg= | Feda), 
a a 
either suppressing the independent variable x or writing it explicitly. 


Suppose that g is an increasing function. Any Riemann-Stieltjes sum for a 
given point partition is bounded above by the upper Riemann-Stieltjes sum 


n 

n a S a 

j=1 
for that point partition and below by the lower Riemann-Stieltjes sum, obtained 
by replacing ‘sup’ by ‘inf’. It is easy to see that f is Riemann-Stieltjes integrable 
with respect to g if and only if for every € > 0 there is a point partition for which 
the corresponding upper and lower Riemann-Stieltjes sums are finite and differ 
by less than €. 


* Problem 1. Calculate fa x? d|x|, where |x] denotes the largest integer that is 
no larger than z. 


* Problem 2. Let 


0 ifz<0 
giz) =< 37" if2-"<a<2-™) ne Zt\ {0} 
1 ifg>1. 


Evaluate fest — x°) dg(z). 


D.2. RELATION TO THE RIEMANN INTEGRAL 705 


Problem 3. Verify the following equalities: 


1 

Bh Be. 

[ za =F 
0 


1 
f 2F(x)dF(x) = F? (1) — F*(0) for F continuous and increasing. 
0 


Problem 4. Let 
z/3 fO<2<1 
F(x)= 4 1/2 ifl<rx<2 
o/s ile 3. 


Prove that I F(x)dF(x) does not exist. 


Problem 5. Prove that on any closed bounded interval, every continuous function 
is Riemann-Stieltjes integrable with respect to every function that is the difference 
of two monotone functions. Hint: Use the uniform continuity of the continuous 
function. 


Problem 6. Let f and g be monotone functions on an interval [a,b] and suppose 
that f is left-continuous and g is right-continuous. Prove that f is Riemann- 
Stieltjes integrable with respect to g. 


It is important to notice that there are no differentiability assumptions in 
Problem 5 or Problem 6. 


D.2. Relation to the Riemann integral 


The following proposition shows how to change some Riemann-Stieltjes integrals 
into Riemann integrals which can then often be evaluated by using the Funda- 
mental Theorem of Calculus. 


Proposition 2. Let g be a function with a continuous first derivative on an 
interval [a,b] and f a bounded R-valued function on [a,b]. Then fg' is Riemann 
integrable on [a,b], if and only if f is Riemann-Stieltjes integrable with respect 
to g on [a,b] in which case 


b b 
(D.1) / f(v) dg(z) = J f(2)g' (2) de. 


PROOF. Let P = {a = zo < £1 < +- < Tn = b} be a point partition of [a,b] 
and let €; € [x;-1,2;] for 1 < j < n. The corresponding Riemann sum of fg' is 


706 D. RIEMANN-STIELTJES INTEGRATION 


and the corresponding Riemann-Stieltjes sum of f with respect to g is 
n 
X £(&) lg(z;) - g(zj-1)], 
j=l 


By the Mean-Value Theorem, there exist numbers n; € [z;~1,2;] such that this 
Riemann-Stieltjes sum equals 


> FE) (ny) (E; — 25-1). 
j=l 


We conclude that the absolute value of the difference between the Riemann sum 
of fg’ and the Riemann-Stieltjes sum of f with respect to g is bounded by 


(D.2) DFG) + |9' (E) — 9 (m3) | (£; — 25-1). 


The quantity (D.2) is bounded by the product of three numbers: any bound 
s of |f|, (b — a), and the maximum of |g'(v) — g'(u)| taken over u,v € [xj_-1, 2], 
1 <j <n. The third of these factors can be made arbitrarily small by taking the 
mesh of P to be sufficiently small, say less than some e. For all refinements of 
such a P there is a correspondence between Riemann sums of fg’ and Riemann- 
Stieltjes sums of f with respect to g such that corresponding sums differ by less 
than s(b—a)e. The desired conclusion follows. O 


Problem 7. Discuss how Proposition 2 might be of use in treating an integral 
f f dg even if g does not satisfy all the conditions in that proposition. 


Problem 8. Evaluate f? , |x—1|d|x| by using Proposition 2, the Fundamental The- 
orem of Calculus, and your response for Problem 7. Do this problem by breaking 
the integral into no more than two pieces for the application of the Fundamental 
Theorem. 


D.3. Change of variables 


The formula for making the same change of variables in both functions of a 
Riemann-Stieltjes integral is easy to remember. 


Proposition 3. Let y be a strictly increasing continuous function on an in- 
terval [a,b]. Then 


b p(b) 
J (f ° p) dlg ° p) -Ji fdg, 
a y(a 
in the sense that if either side exists then so does the other and they are equal. 


Problem 9. Prove the preceding proposition. 


D.4. INTEGRATION BY PARTS 707 


Problem 10. Without concerning yourself with appropriate hypotheses, show how 
the preceding proposition is related to the usual change of variables formula for 
Riemann integrals. 


D.4. Integration by parts 


The following theorem, which gives a general integration by parts formula, is the 
main reason for the existence of this appendix. 


Theorem 4. Suppose that a function f is Riemann-Stieltjes integrable with 
respect to a function g on an interval [a,b]. Then g is Riemann-Stieltjes inte- 
grable with respect to f on [a,b] and 


b b 
J f dg = f(b)g(b) — f(a)gla) — | g df. 


PROOF. Set y = T f dg. Let € > 0 and choose a point partition P such that 
for every point partition {£0, £1,...,Zn} that is a refinement of P and every 
choice of €; € [£zj-1, 25], 


| >, 1&5) (gaa) SG ep) 9 Se 


For such a point partition consider an arbitrary Riemann-Stieltjes sum for g 
with respect to f: 


> slm) (F) = f(E): 


We set no = a and n+1 = b in order to rewrite this Riemann-Stieltjes sum as 
n+1 
f(b)g(b) — f(a)g(a) — 2 f(zi=1) [o(m:) — 9(m-1)] 
n+l 
= f(b)g(b) — fla)g(a) — p2 f(zi-1) [g(@i-1) — g(ni-1)] 
(D.3) 


EK ti~1) [9(m) — g(xi-1)]. 


The combination of these last two summations is the negative of a Riemann- 
Stieltjes sum of f with respect to g for the point partition 


P' = {a = £o = Sm < T1 S N2 S++ SMH S En = Mt =D}, 


the possibility of equality in this description of P’ causing no problem, but only 
indicating that there may be less than 2n subintervals determined by P’. Since 
P’ is a refinement of P, (D.3) differs from f(b)g(b) — f(a)g(a) — y by less than 
€. Since £ is arbitrary the proof is complete. O 


708 D. RIEMANN-STIELTJES INTEGRATION 


It is worth noticing that the integration by parts formula is symmetric in f 
and g. It is also worth observing that the above proof uses a simple technique 
called summation by parts that has a slightly messy appearance in the proof 
because of the interlacing of two sequences. Here is the simple useful formula 
isolated by itself. 


Proposition 5. Let (ao,a1,...,@n) and (bo, b1,...,bn) be two finite sequences 
of real numbers. Then 


` a; [bj = b;-1| = Anby = aobo = Da bi—ı [a; = Qi—ı] 3 
j=l i=1 


Problem 11. Convince yourself that the preceding proposition is true. 
Problem 12. Redo Problem 1 by using integration by parts. 


Problem 13. Show that all monotone functions on an interval are Riemann-Stieltjes 
integrable with respect to every continuous function on that interval (even a con- 
tinuous function whose derivative exists nowhere). Hint: Use Problem 5 and an 
important theorem. 


* Problem 14. Let f be an R-valued function on an interval [a,b], and suppose that 
for every x € [a, 5], 


fla+) = lim f(y) and f(e—) = lim f(y) 


both exist as members of R. Show that if g has a continuous derivative on [a,b], 
then f and g are Riemann-Stieltjes integrable with respect to each other on [a, b]. 


D.5. Improper Riemann-Stieltjes integrals 


The treatment of improper Riemann-Stieltjes integrals parallels that of improper 
Riemann integrals. In particular, when one is using various theorems about 
Riemann-Stieltjes integrals, such as integration by parts, it is wise to first write a 
given improper Riemann-Stieltjes integral as a limit of proper Riemann-Stieltjes 
integrals, then use the theorems, and finally pass to the limit. 


Problem 15. Replace the interval [—7, 5] of integration in Problem 2 by the interval 
(—oo,co) and then do the problem created by this replacement. 


Problem 16. Does the improper integral 


[Yas 
0 T 


exist as a finite number? (Here [x] and |x] denote the smallest integer larger 
than or equal to x and the largest integer less than or equal to z, respectively.) 
Give attention to the issue of existence of appropriate proper Riemann-Stieltjes 
integrals. 


APPENDIX E 


Taylor Approximations, 
C-Valued Logarithms 


For some portions of this book it is important to have a definition of log of 
for a complex-valued function @. Under some restrictions, such a definition is 
presented in the Section 2. A second theme appears throughout this appendix— 
that of approximating or bounding transcendental functions by polynomials. 


E.1. Some inequalities based on the Taylor formula 


From the Taylor formula with remainder one easily gets the following families of 
inequalities: 


2 


v 
l= pr Scosv <1, vER, 
2 4 6 2 4 
v v v Vv v 
lept ee oe a veER, 
v? 
Ug S ve Rt, 
3 5 7 3 5 
v v v v 
dee AS ONE ep eens et ee Be F 
v TNT z Sny sv TT v eR", 
gl 
bam Serek reER, 
2 3 2 
x x ae x 4 
1 kS 7 se STO Gr TER 4 


710 E. TAYLOR APPROXIMATIONS, C VALUED LOGARITHMS 


z- Slog +2) <2, reER’, 
I se ogi gen g 
A N ea E TENA + 
a a +3 1 < log(1+z)< z ae x a | 
Be wk 
(E.1) e” > aa zeER; 
k=0 
and 
We gk 
(E.2) log(1 — z) < — T> € [0,1). 
k=1 


The families (E.1) and (E.2) are one-sided. One technique for getting an 
inequality in the opposite direction is to use a geometric bound on the tail of the 
infinite Taylor series. For x € [0, n), 


n-1 ok ee) gk 
T aca — eee 
a D3 ! 
k=0 k=n 
n—l k n & k-n YO pk n 
T x T x £ 
cSe Ey -Se. 
= | Sima 
k=0 k=n k=0 ? (n — 1)!(n — 2) 
For x € [0, 1), 
n—-1l k oO n=l k n 
x Ze x £ 
E.3 log(1 — z) > — — —=- — — ——. 
Ea Bl k =) n De k n(il-2) 
k=1 k=n k=1 
For small z, the inequality singz < z — x + a is an improvement over 


sing < x. For large x the simpler sing < x may be better, but neither will be 
very good. Elementary extra information can lead to inequalities that are quite 
good for all values of the variable. Here are two of the most important of such 
inequalities: 


(E.4) |sinz| < ae re R; 
(E.5) l — cost < > AD TER: 


Finite and infinite products of real numbers are often treated via their loga- 
rithms if all factors are positive. If there is at least one zero factor, the product 
is 0. If there are some negative factors, the sign can be treated separately and a 
product of positive numbers then studied. 


Problem 1. Let (an: n = 1,2,...) be a sequence of numbers in [0, 1). Show that 
D E Gl: (1 — an) > 0. 


E.2. COMPLEX EXPONENTIALS AND LOGARITHMS 711 


E.2. Complex exponentials and logarithms 


Let z = x+7y, where x and y are real numbers and i denotes a (nonreal) number 
whose square equals —1. Then z is called a complex number, and every complex 
number can be written in this form. The absolute value of z is the nonnegative 
number yz? + y? and is denoted by |z|. The real number z is the real part of 
z and is denoted by R(z). The real number y is the imaginary part of z and is 
denoted by 3(z). We use C to denote the set of all complex numbers. 

If f: R — C, the derivative of f is defined in the natural way: 


fi ZS (Mo fy +i(Fof)', 


wherever the right side is defined. 
The exponential of a complex number z is defined via 


e7 IF ER) (cos(3(z)) + isin(3(z))) - 


Problem 2. Prove that e’t* = ee? for w,z E€ C. 


Problem 3. Let c € C. Prove that the derivative of the function t ~~ e“', t € R, is 
the function t ~ ce“. 


The preceding problem is a special case of the following result. 
Proposition 1. Let 4: R > C be differentiable. Then 


(exp oA)’ = X’ - (exp 0A). 


* Problem 4. Prove the preceding proposition. (Comment: The Chain Rule for 


functions from R to R is available, but does not give the result in one easy step 
since exp here is a function from C to C. There is a Chain Rule for functions from 
C to C, but it does not give a proof because the domain of À is R —and it may not 
be possible to extend the domain of A to C without sacrificing its differentiability.) 


Represent the point (x,y) in polar coordinates: z = rcosv, y = rsinv with 
r >0. Then z + iy = re” because e” = cosv + isinv. If r > 0, let u = logr so 
that 


u+iv = e", 


z r+iy = evel’ cg 
where w is the complex number u + iv. Of course, v is determined by z only up 
to additive multiples of 27. Thus for any complex z # 0, there exist infinitely 
many complex numbers w such that e” = z; any two such w differ by an integral 
multiple of 277. Any such w is called a logarithm of z. Thus for example, the 


logarithms of 7 are (5 + 27q)i, q E€ Z. 


712 E. TAYLOR APPROXIMATIONS, C VALUED LOGARITHMS 


Problem 5. Find all the logarithms of —2, where, of course, you may express your 
answers in terms of log as representing the real logarithm of a positive real number. 
Also, find all complex logarithms of the real number e and of the complex number 
—3 — 2i. 


Problem 6. Let zı and z2 be complex numbers different from 0. Prove that the 
sum of any logarithm of zı and any logarithm of z2 equals a logarithm of z122. 


Let 8: R > C be a continuous function satisfying 6(0) = 1. Let J denote the 
largest open interval containing 0 in which @ is never 0. 


Problem 7. For @ and J as just described, prove that there exists a unique con- 
tinuous function A: J + C such that (0) = 0 and expoA = 8. Despite the fact 
that log is not a well-defined function, the notation log of will be used to denote 
the function A. Prove that for each v € J, (log of)(wv) is a logarithm of G(v). 


Problem 8. Let J be an interval in R and, for each n = 1,2,..., let Bn: J —> C bea 
continuous, nonzero function such that 8,(0) = 1. Show that if Bn — B uniformly 
on compacts subsets of J for some function 8: J > C, then logoZ, — log of 
uniformly on compact subsets of J. 


* Problem 9. Let 8: R — C and suppose that @ is continuous and ((0) = 1. Is it 
necessarily true that (log 03)(ui) = (log of) (u2) if B(u1) = 8(u2)? 


Problem 10. Let ĝı and 82 be continuous functions from R to C taking the value 
1 at 0. Prove that the domain of log 0(@1 G2) is the intersection of the domains of 
log of; and log o@2 and that on this intersection 


log 0(3; G2) = log 08; + log of. . 


Proposition 2. Let 8: R —> C and suppose that 8 is differentiable and sat- 
isfies B(0) = 1. Then (log of)' = 8'/B on the domain of log of. 


Problem 11. Prove the preceding proposition. Hint: This problem may not be as 
easy as it appears. 


For real y, 


Glug ue (Diy _ fey)” 
ty — = = 
e” = cosy +isiny = ya Oj ooo (2k +1)! > ne 


The Binomial Theorem is valid for complex numbers when the exponent is a 
member of Z+. Using it to multiply the power series for e* and e*”, z and y real, 
gives 


(E.6) e = a 


Bp 


E.2. COMPLEX EXPONENTIALS AND LOGARITHMS 713 


where x = SR(z) and y = 3(z). This series is valid for all complex numbers z. We 
have obtained this formula by working with functions whose domain is R. There 
is a more direct approach using the theory of functions of a complex variable. 

For the logarithm it is not such a trivial manner to get a series formula by 
only using the theory of functions of a real variable. From the theory of functions 
of a complex variable, one gets 

oS aa ym 

E.T log(1 + z) = 27qi + — gi V 
(E.7) g(1 +z) = 2ng a Ta 
with the various values of the integer q giving the various logarithms of z. In 
view of Problem 9, care must be used in applying this series to the compositions 
described in the preceding section. 

Possibly more important than convergence of the series (E.6) and (E.7) are 
bounds on errors when these series are truncated. 


Problem 12. Use a bounding geometric series to prove that 


z|” 
SaD aeee) 


for n > |z]. 


Problem 13. Use a bounding geometric series to prove for any choice of log(1 + z), 
there exists an integer q such that 


n—1 
(sh) 2" |z|” 
(E.8 lanai + ~ — log(1 + z)] < —~———_~ 
) 2 — el +2)/< sao 


provided that |z| < 1 and n > 0. 


In this book, Problem 13 is typically used when treating (logoG)(v) in a 
situation where £ is continuous, 8(0) = 1, and |1 — G(u)| < 1 for all u between 
0 and v, including u = v. Then q in (E.8) should be taken equal to 0. 

For fixed n, the bound in Problem 12 is not applicable for large |z| and the 
bound in Problem 13 is not very good for |z| close to 1. It is important that we 
have global bounds for the exponential function when the argument is restricted 
to being pure imaginary. The following inequality follows from (E.4) and (E.5): 


(E.9) je’ — 1| < 2v] A2, VER. 


Successive integrations of (E.9) yields inequalities that are often more useful 
than (E.9) itself. Integration of (E.9) on [0,v] combined with careful handling 
of the absolute value symbols yields 


le” — 1 — iv] < v? AQlv), 


714 E. TAYLOR APPROXIMATIONS, C VALUED LOGARITHMS 


and a second integration yields 


2 3 
(E.10) je —1- Spc EEn, 


E.3. Approximations of general C-valued functions 


An important feature of the following proposition is that it does not contain an 
assumption that the (n + 1)** derivative exists. 


Proposition 3. Let D denote the differentiation operator. Let f be a C- 
valued function defined on an open interval in R containing 0. If (D" f)(0) 
exists, then 


After n — 1 applications of the Hospital Rule (valid for C-valued functions in 
the numerator because it is valid for the real and imaginary parts separately) 
the limit on the left becomes 
sen (DP Aa) = (D™™F)(0) = 2(D"A)O) 
x—0 n!ax 


which equals 0, by the definition of (D"f)(0). O 


) 


Problem 14. Why, in the preceding argument, was the l'Hospital Rule used only 
n — 1 times rather than n times? 


APPENDIX F 
Bibliography 


The first section consists of a list of general books each of which covers a broad 
range of topics in probability. The second section consists of more specialized 
books that treat deeply a few of the topics introduced in this book. There 
has been no attempt to make either of the two sections comprehensive, but we 
have tried to make each list representative of the literature. No books that are 
collections of articles have been included, although one particular article in one 
such collection is listed. 


General probability books 


Ash, Robert B., Real Analysis and Probability, Academic Press, New York, 1972. 

Bauer, Heinz, Probability Theory and Elements of Measure Theory, Academic Press, 
London, 1981. 

Billingsley, Patrick, Probability and Measure, Third Edition, John Wiley & Sons, New 
York, 1995. 

Breiman, Leo, Probability, Society for Industrial and Applied Mathematics, Philadel- 
phia, 1992. 

Chow, Yuan Shih and Teicher, Henry, Probability Theory: Independence, Interchange- 
ability, Martingales, 24 Edition, Springer-Verlag, New York, 1988. 

Chung, Kai Lai, A Course in Probability Theory, Second Edition, Academic Press, New 
York, 1974. 

De Finetti, Bruno, Theory of Probability Vol. 1, John Wiley & Sons, London, 1974. 

De Finetti, Bruno, Theory of Probability Vol. 2, John Wiley & Sons, London, 1975. 

Dudley, R. M., Real Analysis and Probability, Wadsworth & Brooks/Cole Advanced 
Books & Software, Pacific Grove, California, 1989. 

Durrett, Richard, Probability: Theory and Examples, Wadsworth & Brooks/Cole Ad- 
vanced Books & Software, Pacific Grove, California, 1991. 

Feller, William, An Introduction to Probability Theory and Its Applications, Vol. I, 
Third Edition, John Wiley & Sons, New York, 1968. 

Feller, William, An Introduction to Probability Theory and Its Applications, Vol. TI, 
Second Edition, John Wiley & Sons, New York, 1971. 


716 F. BIBLIOGRAPHY 


Galambos, Janos, Advanced Probability Theory, Marcel Dekker, New York, 1988. 

Gnedenko, B. V., The Theory of Probability, Second Edition (translated from Russian 
with additions), Chelsea, New York, 1962. 

Grimmett, Geoffrey and Stirzaker, David, Probability and Random Processes, Second 
Edition, Clarendon, Oxford, 1992. 

It6, Kiyosi, Introduction to Probability Theory, Cambridge University Press, Cam- 
bridge, 1984. 

Kingman, J. F. C. and Taylor, S. J., Introduction to Measure and Probability, Cam- 
bridge University Press, Cambridge, 1966. 

Lamperti, John, Probability, W. A. Benjamin, New York, 1966. 

Loéve, M., Probability Theory I, Ae Edition, Springer-Verlag, New York, 1977. 

Loéve, M., Probability Theory II, Ain Edition, Springer-Verlag, New York, 1977. 

Moran, P. A. P., An Introduction to Probability Theory, Clarendon Press, Oxford, 1984. 

Nelson, Edward, Radically Elementary Probability Theory, Princeton University Press, 
Princeton, New Jersey, 1987. 

Port, Sidney C., Theoretical Probability for Applications, John Wiley & Sons, New 
York, 1994. 

Rényi, Alfred, Foundations of Probability, Holden Day, San Francisco, 1970. 

Shiryaev, A. N., Probability, Second Edition (translated from Russian, orig. 1989), 
Springer-Verlag, New York, 1996. 

Stroock, Daniel W., Probability Theory, an Analytic View, Cambridge University Press, 
Cambridge, 1993. 

Tucker, Howard G., A Graduate Course in Probability, Academic Press, New York, 
1967. 

Whittle, Peter, Probability, Penguin Books, Middlesex, England, 1970. 

Williams, David, Probability with Martingales, Cambridge University Press, Cam- 
bridge, 1991. 


General books on stochastic processes 


Bhattacharya, Rabin and Waymire, Edward C., Stochastic Processes with Applications, 
John Wiley & Sons, New York, 1990. 

Cinlar, Erhan, Introduction to Stochastic Processes, Prentice-Hall, Englewood Cliffs, 
New Jersey, 1975. 

Cox, D. R. and Miller, H. D., The Theory of Stochastic Processes, Chapman and Hall, 
London, 1980. 

Dellacherie, Claude and Meyer, Paul-André, Probabilities and Potential A (translated 
from French, orig. 1976), North-Holland, Amsterdam, 1978. 

Doob, J. L., Stochastic Processes, John Wiley & Sons, New York, 1953. 

Gihman, I. I. and Skorohod, A. V., Theory of Stochastic Processes I, Springer-Verlag, 
New York, 1974. 

Gihman, I. I. and Skorohod, A. V., The Theory of Stochastic Processes II, Springer- 
Verlag, New York, 1975. 

Gihman, I. I. and Skorohod, A. V., The Theory of Stochastic Processes III, Springer- 
Verlag, Berlin, 1979. 

Gray, Robert M., Probability, Random Processes, and Ergodic Properties, Spring-er- 
Verlag, New York, 1988. 


F. BIBLIOGRAPHY 717 


Iranpour, Reza and Chacon, Paul, Basic Stochastic Processes, Macmillan, New York, 
1988. 

Jacod, J. and Shiryaev, A. N., Limit Theorems for Stochastic Processes, Springer- 
Verlag, Berlin, 1987. 

Karlin, Samuel and Taylor, Howard M., A First Course in Stochastic Processes, Second 
Edition, Academic Press, New York, 1975. 

Karlin, Samuel and Taylor, Howard M., A Second Course in Stochastic Processes, 
Academic Press, New York, 1981. 

Lindvall, Torgny, Lectures on the Coupling Method, John Wiley & Sons, New York, 
1992. 

Rao, M. M., Stochastic Processes: General Theory, Kluwer Academic Publishers, Dor- 
drecht, 1995. 

Resnick, Sidney I., Adventures in Stochastic Processes, Birkhäuser, Boston, 1992. 

Wiliams, David, Diffusions, Markov Processes, and Martingales, Vol. 1: Foundations, 
John Wiley & Sons, Chichester, 1979. 


Books related to Chapter 12 


Lukacs, Eugene, Stochastic Convergence, Second Edition, Academic Press, New York, 
1975. 
Révész, Pál, The Laws of Large Numbers, Academic Press, New York, 1966. 


Books related to Chapter 13 


Hirschman, I. I. and Widder, D. V., The Convolution Transform, Princeton University 
Press, Princeton, New Jersey, 1955. 
Lukacs, Eugene, Characteristic Functions, Second Edition, Charles Griffin, London, 


1970. 

Lukacs, Eugene, Developments in Characteristic Function Theory, Macmillan, New 
York, 1983. 

Widder, D. V., An Introduction to Transform Theory, Academic Press, New York, 
1971. 


Books related to Chapters 14, 16, and 17 


Gnedenko, B. V. and Kolmogorov, A. N., Limit Distributions for Sums of Independent 
Random Variables, Addison-Wesley, Reading, Massachusetts, 1954. 

Gumbel, E. J., Statistics of Extremes, Columbia University Press, New York, 1958. 

Petrov, Valentin V., Limit Theorems of Probability Theory: Sequences of Independent 
Random Variables, Clarendon Press, Oxford, 1995. 

Zolotarev, V. M., One-dimensional Stable Distributions (translation from Russian, orig. 
1983), American Mathematical Society, Providence, Rhode Island, 1986. 


Books related to Chapter 15 


Deuschel, Jean-Dominique and Stroock, Daniel W., Large Deviations, Academic Press, 
Boston, 1989. 


718 F. BIBLIOGRAPHY 


Ellis, Richard S., Entropy, Large Deviations, and Statistical Mechanics, Springer- 
Verlag, New York, 1985. 

Varadhan, S. R. S., Large Deviations and Applications, Society for Industrial and Ap- 
plied Mathematics, Philadelphia, 1984. 


Books related to Chapters 18, 19, 24, and 33 


Billingsley, Patrick, Convergence of Probability Measures, John Wiley & Sons, New 
York, 1968. 

Dellacherie, Claude and Meyer, Paul-André, Probabilities and Potential B (translated 
from French, orig. 1980), North-Holland, Amsterdam, 1982. 

Durrett, Richard, Brownian Motion and Martingales in Analysis, Wadsworth Advanced 
Books & Software, Belmont, California, 1984. 

Durrett, Richard, Stochastic Calculus: A Practical Introduction, CRC Press, Boca 
Raton, Florida, 1996. 

Einstein, Albert, Investigations on the Theory of the Brownian Movement, Dover, New 
York, 1956. 

Freedman, David, Brownian Motion and Diffusion, Holden-Day, San Francisco, 1971. 

Friedman, Avner, Stochastic Differential Equations and Applications, Vol. 1, Academic 
Press, New York, 1975. 

Friedman, Avner, Stochastic Differential Equations and Applications, Vol. 2, Academic 
Press, New York, 1976. 

Gard, Thomas C., Introduction to Stochastic Differential Equations, Marcel Dekker, 
New York, 1988. 

Gihman, I. I. and Skorohod, A. V., Controlled Stochastic Processes, Springer-Verlag, 
New York, 1979. 

He, Sheng-wu and Wang, Jia-gang and Yan, Jia-an, Semimartingale Theory and Stochas- 
tic Calculus, CRC Press, Boca Raton, Florida, 1992. 

Hida, Takeyuki, Brownian Motion, Springer-Verlag, New York, 1980. 

Ikeda, Nobuyuki and Watanabe, Shinzo, Stochastic Differential Equations and Diffu- 
ston Processes, Second Edition, Kodansha, Tokyo, 1989. 

Itô, Kiyosi, Foundations of Stochastic Differential Equations in Infinite Dimensional 
Spaces, Society for Industrial and Applied Mathematics, Philadelphia, 1984. 

Itô, K. and McKean Jr., H. P., Diffusion Processes and Their Sample Paths, Springer- 
Verlag, Berlin, 1974. 

Krylov, N. V., Introduction to the Theory of Diffusion Processes, American Mathemat- 
ical Society, Providence, Rhode Island, 1995. 

Ledoux, Michel and Talagrand, Michel, Probability in Banach Spaces, Springer-Verlag, 
Berlin, 1991. 

Lukacs, Eugene, Stochastic Convergence, Second Edition, Academic Press, New York, 
1975. 

McKean Jr., H. P., Stochastic Integrals, Academic Press, New York, 1969. 

Metivier, Michel and Pellaumail, J., Stochastic Integration, Academic Press, New York, 
1980. 

Metivier, Michel, Semimartingales, a Course on Stochastic Processes, Walter de Gruyter, 
Berlin, 1982. 

Meyer, Paul A., Probability and Potentials, Blaisdell, Waltham, Massachusetts, 1966. 


F. BIBLIOGRAPHY 719 


Nualart, David, The Malliavin Calculus and Related Topics, Springer-Verlag, New 
York, 1995. 

Parthasarathy, K. R., Probability Measures on Metric Spaces, Academic Press, New 
York, 1967. 

Portenko, N. I., Generalized Diffusion Processes (translation from Russian, orig. 1982), 
American Mathematical Society, Providence, Rhode Island, 1990. 

Revuz, Daniel and Yor, Marc, Continuous Martingales and Brownian Motion, Second 
Edition, Springer-Verlag, Berlin, 1994. 

Rogers, L. C. G. and Williams, David, Diffustons, Markov Processes, and Martingales, 
Vol. 2: Itô Calculus, John Wiley & Sons, Chichester, 1987. 

Skorohod, A. V., Asymptotic Methods in the Theory of Stochastic Differential Equations 
(translated from Russian, orig. 1987), American Mathematical Society, Provi- 
dence, Rhode Island, 1989. 

Stroock, D. W. and Varadhan, S. R. S., Multidimensional Diffusion Processes, Springer- 
Verlag, Berlin, 1979. 

Yeh, J., Stochastic Processes and the Wiener Integral, Marcel Dekker, New York, 1973. 

Yor, Marc, Some Aspects of Brownian Motion: Part I: Some Special Functionals, 
Birkhauser Verlag, Basel, Switzerland, 1992. 


Books related to Chapters 11, 25, 26, 30, and 31 


Athreya, K. B. and Ney, P. E., Branching Processes, Springer-Verlag, New York, 1972. 

Blumenthal, R. M. and Getoor, R. K., Markov Processes and Potential Theory, Aca- 
demic Press, New York, 1968. 

Chen, Mu Fa, From Markov Chains to Non-Equilibrium Particle Systems, World Sci- 
entific, Singapore, 1992. 

Chow, Y. S. and Robbins, Herbert and Siegmund, David, Great Expectations: The 
Theory of Optimal Stopping, Houghton Mifflin, Boston, 1971. 

Chung, Kai Lai, Markov Chains with Stationary Transition Probabilities, Second Edi- 
tion, Springer-Verlag, New York, 1967. 

Chung, Kai Lai, Lectures on Boundary Theory for Markov Chains, Princeton University 
Press, Princeton, New Jersey, 1970. 

Dellacherie, Claude and Meyer, Paul-André, Probabilities and Potential C (translated 
from French), Elsevier Science Publishers B. V., Amsterdam, 1988. 

Dellacherie, Claude and Meyer, Paul-André, Probabilités et Potential (Théorie du po- 
tentiel associée à une résolvante, Théorie des processus de Markov), Hermann, 
Paris, 1987. 

Doob, J. L., Classical Potential Theory and its Probabilistic Counterpart, Springer- 
Verlag, New York, 1984. 

Doyle, Peter G. and Snell, J. Laurie, Random Walks and Electric Networks, Mathe- 
matical Association of America, 1984. 

Dynkin, E. B., Markov Processes, Vol. I, Academic Press, New York, 1965. 

Dynkin, E. B., Markov Processes, Vol. II, Academic Press, New York, 1965. 

Dynkin, Evgenii B. and Yushkevich, Alexsandr A., Markov Processes: Theorems and 
Problems, Plenum Press, New York, 1969. 

Edgar, G. A. and Sucheston, Louis, Stopping Times and Directed Processes (Encyclo- 
pedia of Mathematics and Its Applications, Vol. 47), Cambridge University Press, 


720 F. BIBLIOGRAPHY 


Cambridge, 1992. 

Ethier, Stewart N. and Kurtz, Thomas G., Markov Processes: Characterization and 
Convergence, John Wiley & Sons, New York, 1986. 

Freedman, David, Approrimating Countable Markov Chains, Holden-Day, San Fran- 
cisco, 1971. 

Freedman, David, Markov Chains, Holden-Day, San Francisco, 1971. 

Harris, Theodore E., The Theory of Branching Processes, Dover, New York, 1989. 

Hughes, Barry D., Random Walks and Random Environments, Vol. 1: Random Walks, 
Clarendon Press, Oxford, 1995. 

Hughes, Barry D., Random Walks and Random Environments, Vol. 2: Random Envi- 
ronments, Clarendon Press, Oxford, 1996. 

losifescu, Marius, Finite Markov Processes and Their Applications, John Wiley & Sons, 
Chichester, 1980. 

Kalashnikov, Vladimir V., Topics on Regenerative Processes, CRC Press, Boca Raton, 
Florida, 1994. 

Kemeny, John G. and Snell, J. Laurie and Knapp, Anthony W., Denumerable Markov 
Chains, D. van Nostrand, Princeton, New Jersey, 1966. 

Kingman, J. F. C., Regenerative Phenomena, John Wiley & Sons, London, 1972. 

Lawler, Gregory F., Intersections of Random Walks, Birkhauser, Boston, 1991. 

Maisonneuve, Bernard, Systémes Régénératifs (Astérique, Vol. 15), Société Mathéma- 
tique de France, Paris, 1974. 

Révész, Pal, Random Walk in Random and Non-Random Environments, World Scien- 
tific, Singapore, 1990. 

Sharpe, Michael, General Theory of Markov Processes, Academic Press, Boston, 1988. 

Spitzer, Frank, Principles of Random Walk, Second Edition, Springer-Verlag, New 
York, 1976. 

Tackás, Lajos, Combinatorial Methods in the Theory of Stochastic Processes, John 
Wiley & Sons, New York, 1967. 

Yang, Xiang-qun, The Construction Theory of Denumerable Markov Processes, John 
Wiley & Sons, Chichester, 1990. 


Books related to Chapter 27 


Aldous, D. J., “Exchangeability and related topics”, Ecole d’Eté de Probabilités de 
Saint-Flour XIII, (ed. Hennequin, P. L.) 1-198, Springer-Verlag, Berlin, 1985. 


Books related to Chapter 28 


Furstenberg, Harry, Stationary Processes and Prediction Theory, Princeton University 
Press, Princeton, New Jersey, 1960. 

Hida, Takeyuki, Stationary Stochastic Processes, Princeton University Press, Princeton, 
New Jersey, 1970. 

Kahane, Jean-Pierre, Some Random Series of Functions, Second Edition, Cambridge 
University Press, Cambridge, 1985. 

Khinchin, A. I., Mathematical Foundations of Information Theory, Dover, New York, 
1957. 


F. BIBLIOGRAPHY 721 


Knight, Frank B., Foundations of the Prediction Process, Clarendon Press, Oxford, 
1992. 

Leadbetter, M. R. and Lindgren, Georg and Rootzén, Holger, Extremes and Related 
Properties of Random Sequences and Processes, Springer-Verlag, New York, 1983. 

Lifshits, M. A., Gaussian Random Functions, Kluwer Academic Publishers, Dordrecht, 
1995. 

Smythe, Robert T. and Wierman, John C., First-Passage Percolation on the Square 
Lattice, Springer-Verlag, Berlin, 1978. 

Walters, Peter, An Introduction to Ergodic Theory, Springer-Verlag, New York, 1982. 

Yaglom, A. M., Correlation Theory of Stationary and Related Random Functions, Vol 
I: Basic Results, Springer-Verlag, New York, 1987. 

Yaglom, A. M., Correlation Theory of Stationary and Related Random Functions, Vol 
IT: Supplementary Notes and References, Springer-Verlag, New York, 1987. 


Books related to Chapter 29 


Aldous, David, Probability Approximations via the Poisson Clumping Heuristic, Springer- 
Verlag, New York, 1989. 

Ambartzumian, R. V., Factorization Calculus and Geometric Probability (Encyclope- 
dia of Mathematics and Its Applications, Vol. 83), Cambridge University Press, 
Cambridge, 1990. 

Brémaud, Pierre, Point Processes and Queues: Martingale Dynamics, Springer-Verlag, 
New York, 1981. 

Cox, D. R. and Isham, Valerie, Point Processes, Chapman and Hall, London, 1980. 

Daley, D. J. and Vere-Jones, D., An Introduction to the Theory of Point Processes, 
Springer-Verlag, New York, 1988. 

Franken, Peter and Konig, Dieter and Arndt, Ursula and Schmidt, Volker, Queues and 
Point Processes, John Wiley & Sons, Chichester, 1980. 

Hall, Peter, Introduction to the Theory of Coverage Processes, John Wiley & Sons, New 
York, 1988. 

Kallenberg, Olav, Random Measures, Akademie-Verlag, Berlin, 1983. 

Kingman, J. F. C., Poisson Processes, Clarendon Press, Oxford, 1993. 

Matheron, G., Random Sets and Integral Geometry, John Wiley & Sons, New York, 
1975. 

Matthes, Klaus and Kerstan, Johannes and Mecke, Joseph, Infinitely Divisible Point 
Processes, John Wiley & Sons, Chichester, 1978. 

Molchanov, Ilya S., Limit Theorems for Unions of Random Closed Sets, Springer- 
Verlag, Berlin, 1993. 

Resnick, Sidney I., Extreme Values, Regular Variation, and Point Processes, Springer- 
Verlag, New York, 1987. 

Reiss, R.-D., A Course on Point Processes, Springer-Verlag, New York, 1993. 

Santaló, Luis A., Integral Geometry and Geometric Probability (Encyclopedia of Math- 
ematics and Its Applications, Vol. 1), Addison-Wesley, Reading, Massachusetts, 
1976. 

Solomon, Herbert, Geometric Probability, Society of Industrial and Applied Mathemat- 
ics, Philadelphia, 1978. 


722 F. BIBLIOGRAPHY 


Stoyan, Dietrich and Stoyan, Helga Fractals, Random Shapes and Point Fields: Meth- 
ods of Geometrical Statistics, John Wiley & Sons, Chichester, 1994. 


Books related to Chapter 32 


Durrett, Richard, Lecture Notes on Particle Systems and Percolation, Wadsworth & 
Brooks/Cole Advanced Books & Software, Pacific Grove, California, 1988. 

Griffeath, David, Additive and Cancellative Interacting Particle Systems, Springer- 
Verlag, Berlin, 1979. 

Georgii, Hans-Otto, Gibbs Measures and Phase Transitions, Walter de Gruyter, Berlin, 
1988. 

Grimmett, Geoffrey, Percolation, Springer-Verlag, New York, 1989. 

Kesten, Harry, Percolation Theory for Mathematicians, Birkhauser, Boston, 1982. 

Khinchin, A. I., Mathematical Foundations of Statistical Mechanics, Dover, New York, 
1949. 

Kindermann, Ross and Snell, J. Laurie, Markov Random Fields and their Applications, 
American Mathematical Society, Providence, Rhode Island, 1980. 

Liggett, Thomas M., Interacting Particle Systems, Springer-Verlag, New York, 1985. 


Other books 


Adler, Robert J., The Geometry of Random Fields, John Wiley & Sons, Chichester, 
1981. 

Alon, Noga and Spencer, Joel H. and Erdés, Paul, The Probabilistic Method, John 
Wiley & Sons, New York, 1992. 

Bollobas, Béla, Random Graphs, Academic Press, London, 1985. 

David, H. A., Order Statistics, Second Edition, John Wiley & Sons, New York, 1981. 

Dubins, Lester E. and Savage, Leonard J., Inequalities for Stochastic Processes (How 
to Gamble if You Must), Dover, New York, 1976. 

Hida, Takeyuki and Hitsuda, Masuyuki, Gaussian Processes (translation from Japanese, 
orig. 1976), American Mathematical Society, Providence, Rhode Island, 1993. 

Kac, Mark, Statistical Independence in Probability Analysis and Number Theory, Math- 
ematical Association of America, 1959. 

Kolchin, Valentin F., Random Mappings, Optimization Software, New York, 1986. 

Maitra, Ashok P. and Sudderth, William D., Discrete Gambling and Stochastic Games, 
Springer-Verlag, New York, 1966. 

Palmer, Edgar M., Graphical Evolution, John Wiley & Sons, New York, 1985. 

Piterbarg, Vladimir I., Asymptotic Methods in the Theory of Gausstan Processes and 
Fields (translation from Russian, orig. 1988), American Mathematical Society, 
Providence, Rhode Island, 1996. 

Pinsky, Mark A., Lectures on Random Evolution, World Scientific, Singapore, 1991. 

Spencer, Joel, Ten Lectures on the Probabilistic Method, Second Edition, Society of 
Industrial and Applied Mathematics, Philadelphia, 1993. 

van der Vaart, Aad W. and Wellner, Jon A., Weak Convergence and Empirical Pro- 
cesses, Springer-Verlag, New York, 1996. 

Vanmarcke, Erik, Random Fields: Analysis and Synthesis, The MIT Press, Cambridge, 
Massachusetts, 1983. 


APPENDIX G 
Comments and Credits 


Some comments supplementing the text will be given, and sources in the litera- 
ture for further specific information will be identified. However, books that are 
closely related to significant portions of this book will not be cited here. Rather, 
there is an extensive list of such books in Appendix F. 


Concerning Chapter 1 


The axioms for a probability space given here are essentially the same as those given by 
Kolmogorov in 1933 [Kolmogorov, A. N., Foundations of Probability, Second English 
Edition, Chelsea, New York, 1956]. 

Parts (iii) and (iv) of Problem 12 are special cases of a problem discussed by Mar- 
tin Gardiner in “Mathematical games” on pages 120-125 of the October 1974 issue of 
Scientific American. For more about this problem, see Robert W. Chen, “A circu- 
lar property of the occurrence of sequence patterns in the fair coin-tossing process”, 
Advances in Applied Probability 21 (1989), 938-940. 


Concerning Chapter 2 


Proposition 7 plays an important role in probability theory, but the result may not even 
be mentioned in some measure theory courses. In probability theory one works with 
induced measures—that is, distributions—more often than with a probability measure 
on an underlying probability space. Sometimes the only probability measure identified 
in a discussion is called a ‘distribution’ since the communicator wants to indicate that 
he or she is thinking of this probability measure as possibly being induced. 


Concerning Chapter 3 


The adjective ‘Cauchy’ for distributions is somewhat ambiguous. See the comments 
in this appendix concerning Chapter 17 for further information about the adjective 
‘Cauchy’. 

A commonly used term for a distribution function F for R is ‘improper distribution 
function’. Some use this term only when Foo) < 1 or F(—oo) > 0. Since we do not 


724 G. COMMENTS AND CREDITS 


use this term, we must always make sure that the setting R or R is clear. 

We have defined distribution functions so that they are right-continuous. One could 
equally well define them to be left-continuous. If one were to do that, then the distri- 
bution functions corresponding to nonnegative random variables would be those whose 
value at 0 is 0. 

Some people work with distribution functions of R¢-valued random variables for 
d > 1 as well as for d = 1. We feel that it is easier to work with distributions than with 
distribution functions when d > 1. 


Concerning Chapter 4 


With respect to operators, some might use the term ‘nonnegative’ where we use ‘posi- 
tive’, and ‘positive’ where we use ‘strictly positive’. 


Concerning Chapter 5 


The definition of the value of a probability generating function at 1 is a matter of 
taste. Many would take a different view from ours and define its value at 1 to be the 
value that the corresponding probability measure on Z“ assigns to the set Zt. This 
approach produces probability generating functions that are continuous on the closed 
interval [0,1], but it also gives some probability generating functions that have finite 
derivatives at 1 even though the corresponding distributions have infinite mean. 

The proof of Corollary 5 is the same proof as given on page 8 of the book by Kahane, 
identified in Appendix F. It also appears in the earlier (1968) edition of the same book. 
Kahane treats it as elementary, does not claim it is new, does not give it a name, and 
does not attribute it to anyone. 

For matrices, some would use the adjective ‘positive definite’ where we use ‘strictly 
positive definite’, and ‘nonnegative definite’ where we use ‘positive definite’. 

A good reference for generating functions is: Wilf, Herbert S., Generatingfunctionol- 
ogy, Academic Press, Boston, 1990. 

The book Enumerative Combinatorics, Vol. I, by Richard P. Stanley and published 
by Wadsworth & Brooks/Cole Advanced Books and Software, Monterey, California, 
1986, is a good reference for Stirling numbers and other topics in combinatorics. 


Concerning Chapter 6 


An argument along the lines of the proof of Corollary 5 of Chapter 4 is contained within 
the main proof by Simon Kochen and Charles Stone in “A note on the Borel-Cantelli 
Lemma”, Illinois Journal of Mathematics 8 (1964), 248-251. 

The Inclusion-Exclusion Theorem is more a combinatorial theorem than a probabil- 
ity theorem. It is treated as a separate topic in the book by Stanley which is identified 
in the comments concerning Chapter 5 in this appendix. 


Concerning Chapter 7 


W. Sierpiński in “Un théorème général sur les familles d'ensembles” , Fundamenta Math- 
ematicae 12 (1928), 206-210, proved a theorem along the lines of what we have called 
the Sierpiński Class Theorem. For Sierpiński, the requirement of being closed under 


G. COMMENTS AND CREDITS 725 


countable disjoint unions plays the role of the requirement that we stated of being 
closed under countable increasing limits. In our knowledge, the first occurrence of the 
Sierpinski Class Theorem in the form we use it is in the book by Blumenthal and Getoor 
identified in Appendix F. It may be that E. B. Dynkin was the first to use the theorem 
extensively in probability theory. In his book, Theory of Markov Processes, Prentice- 
Hall, Englewood Cliffs, New Jersey, 1961, Dynkin calls the result the 7-\A Theorem, a 
name that is often used. 


Before 1960 the Monotone Class Theorem was often used. It says that the smallest 
monotone class containing a field is equal to the smallest o-field containing that field, 
a monotone class being a family of sets that is closed under countable increasing and 
countable decreasing limits. See, for example: Halmos, Paul R., Measure Theory, 
Springer-Verlag, New York, 1974. The Monotone Class Theorem can be used to prove 
the following result: Let p and v be finite measures on a common measurable space 
(2, F). Suppose that (A) < v(A) for all A in some field € for which F = o(€). Then 
u(B) < v(B) for every B € F. The statement obtained by replacing “field E€” by 
“family E that is closed under finite intersections” is false, as is seen by letting Q be 
some two-point set and E consist of a single one-point set {7}, and letting yz and v be 
probability measures for which p({r}) < v({r}). If asked to provide a situation where 
the Monotone Class Theorem is useful and the Sierpinski Class Theorem is not, we 
would be hard-pressed to give an example other than the theorem just described. On 
the other hand, probability theory has many places where the Sierpinski Class Theorem 
is a better tool than is the Monotone Class Theorem. The term ‘monotone class 
theorem’ is used by some as a generic term indicating the Sierpiński Class Theorem, 
the Monotone Class Theorem, or other theorem of a similar sort. 


In an early manuscript version of this book, we gave a proof of the Extension The- 
orem that used the full recursively defined sequence of fields (E, E1, €2,...) and more; 
namely the recursive definition was extended transfinitely and it was shown that for 
some ordinal the corresponding field and its successor are identical. John Baxter showed 
us the alternative approach of completing E2. 


Concerning Chapter 8 


The Riemann integral has not played a role in our development of the Lebesgue in- 
tegral, although after having defined the Lebesgue integral with respect to Lebesgue 
measure, we have examined the relationship between the two integrals. An approach 
that begins with the Riemann integral notices the linearity of the integral viewed as 
an operator, and then extends this operator to a wider class of functions than the 
Riemann-integrable functions is presented in Section 9.4 of: Kingman, J. F. C. and 
Taylor, S. J., Introduction to Measure and Probability, Cambridge University Press, 
Cambridge, 1966. An intermediate point of view is taken in: Rudin, Walter, Real and 
Compler Analysts, Second edition, McGraw-Hill, New York, 1974. Lebesgue integrals 
are defined there in a manner similar to that used in this book, but Lebesgue measure is 
constructed in a different manner. The Riesz Representation Theorem is used: it says 
that there is a unique measure having the property that integration with respect to it 
agrees with a given bounded linear operator—for example, the Riemann integral—on 
the space of continuous functions on a closed bounded interval. 


The argument that we have given that (iii) implies (i) in the proof of the Uniform 


726 G. COMMENTS AND CREDITS 


Integrability Criterion is one we learned from Naresh Jain. 

A proof of the Stirling Formula similar to that outiined in Problem 19 through Prob- 
lem 23 is contained in: Patin, J. M., “A very short proof of Stirling’s Formula”, The 
American Mathematical Monthly 96 (1989), 41-42. In that paper the proof commences 
with a clever substitution that enables one to complete the proof by using the Domi- 
nated Convergence Theorem only once. More accurate approximations of the gamma 
function are available via asymptotic series. See, for example: Marsaglia, George and 
Marsaglia John C. W., “A new derivation of Stirling’s Approximation”, The American 
Mathematical Monthly 97 (1990), 826-829. 

The term ‘absolutely continuous’ is often used for certain distribution functions, 
namely those corresponding to measures that are absolutely continuous with respect 
to Lebesgue measure. It can be shown that the derivative of an absolutely continuous 
distribution function exists on the complement of a set having zero Lebesgue measure, 
and is a Radon-Nikodym derivative of the corresponding distribution with respect 
to Lebesgue measure. (Notice that in Problem 35 it is assumed that F’ is defined 
everywhere.) It can also be shown that every distribution function F for R can be 
written as F = a, Fi + a2 Fo + a03F3, where 0 < Qj for 1 < j < 3, aı + a2 + Q3 = 1, Fi 
is an absolutely continuous distribution function, Fə and F} are distribution functions 
for which F>, = F; = 0 a.e., F2 corresponds to a distribution that assigns probability 
1 to a countable set, and F3 is continuous. 


Concerning Chapter 9 


Persi Diaconis brought the zeta distribution to our attention. 

For information beyond that contained in Problem 55 and Problem 57 concerning 
independence of the radial and angular components of random vectors see: Jennrich, 
Robert I. and Port, Sidney C., “Radial and directional parts of a random vector”, 
Statistics € Probability Letters 6 (1987-8), 155-158. 


Concerning Chapter 10 


In some books, especially those not focused on probability theory, the term ‘convolu- 
tion’ has a meaning slightly different from the one given in this book, and there is a 
corresponding incompatibility of notations. 

It is not obvious how to generalize to R? the definition given in the text of the 
support function p of a convex set. Here is a different definition: the support function 
q: R? \ {0} > R of a compact convex subset A of R? is given by 


q(z) = sup{(z,z): £ E€ A}. 


For z € R’, regarded as a complex number re’” with r > 0, it is clear that g(z) = rp(y), 
where p is the function introduced in the text as the support function. 


Concerning Chapter 11 


Some solutions of Problem 29 appear as solutions of E2976 (proposed by Lee Whitt) 
in The American Mathematical Monthly 93 (1986), 62-63. The solutions there are 
analytic in character and apply for 0 < p < 2. 


G. COMMENTS AND CREDITS 727 


Results for the random walk having steps equal to +1 each with probability 5 may 
be viewed as results in combinatorics. For instance, from Example 5 we see that there 
are 2 Cee ) length-2m sequences consisting of +1’s and having the property that 
the number of 1’s and the number of —1’s are equal in the entire sequence but not in 
any initial proper subsequence. 


Concerning Chapter 12 


The Cantor function plays an important role in courses on real analysis. Even though it 
is continuous, increasing, and not the constant function, its derivative equals 0 a.e.. But 
it is in probability theory that functions with this ‘strange’ combination of properties 
arise most naturally. See Problem 18, for example. 

Many years ago Dean Isaacson and Willis Owen brought to the attention of one of 
us that the Kolmogorov Three-Series Theorem can be replaced by a two-series version. 
The definition of Y, needs to be changed for their version: Y,(w) = b if Xn(w) > b, 
= —b if X,(w) < —b, and otherwise = X,,(w). Then the first of the three series can be 
dropped from the Kolmogorov Three-Series Theorem. 


Concerning Chapter 13 


An approach different from that of Example 1 for calculating the characteristic function 
of the standard normal distribution is to make the substitution z = x—iv. One obtains 


where y is the horizontal left-to-right oriented line in the complex plane that passes 
through the points with imaginary part —v. One uses residue theory (a topic in complex 
analysis) to replace y by the real axis and thus finish the calculation. 

Moment generating functions can be defined for some distributions that are not sup- 
ported by (0, co]. The normal distribution is an example, since its density is sufficiently 
small far to the left on the negative portion of R. Some people define the ‘moment 
generating function’ as t ~> E(e'*), for all t for which the expectation is finite. Here t 
is the negative of the variable that we use in the text. We do not favor this approach 
because there are distributions supported by R* for which this version of the moment 
generating function fails to exist for all positive t. 

There is inconsistency in the literature and tables concerning the definitions of the 
sine integral si and cosine integral ci. For instance, for the sine integral some use 0 
rather than oo as the fixed endpoint of integration. 

For C-valued functions, some would use the adjective ‘positive definite’ where we use 
‘strictly positive definite’, and ‘nonnegative definite’ where we use ‘positive definite’. 


Concerning Chapter 14 


Many people attach an adjective to the convergence Fn —> F that we have defined. 
Some of these adjectives are ‘vague’, ‘weak’, ‘weak*’, and ‘complete’. For some, two 
adjectives are used in order to distinguish the R-setting from the R-setting. The fact 
that a distribution function F generates a linear functional g ~ f gdF on the space 


728 G. COMMENTS AND CREDITS 


of continuous functions having finite limits at oo and —oo motivates some of the ter- 
minology. 


Concerning Chapter 15 


Of the strictly stable distributions on Rt‘, only those of index 1/2 seem to have nice 
closed-form formulas for their densities. (Of course those of index 1, while not having 
densities, do have nice distribution functions.) 


Concerning Chapter 16 


Other functions ¥ can be used in lieu of x as pictured in 16.1; y must be continuous 
and satisfy 


Izlv) — x(y)| < ely? A1) 


for some c € R. The continuity assumption can even be relaxed, but then the treatment 
of triangular arrays becomes slightly more complicated. 

There is a Lévy-Khinchin Representation Theorem for infinitely divisible distribu- 
tions in Rf. One choice for the function x is 


x(y) a (a A 1) V (—1)] oa eae (ya A 1) V (-1)]) : 


The measure v on R? \ {0} is a Lévy measure if and only if 


[ ewes: 
R2\ {0} 


The arbitrary real number 7) and the nonnegative number o” should be replaced by an 
arbitrary member of Rf and a positive definite symmetric matrix, respectively. 


Concerning Chapter 17 


The term asymmetric Cauchy is often used to describe the distributions in Theorem 10 
with y # 0. When ‘Cauchy’ is used without modification it must be clear, implicitly or 
explicitly, whether all the distributions in Theorem 10 are encompassed or only those 
for which y = 0. The phrase ‘symmetric Cauchy centered at 6’ would indicate that 
y = 0, € = b, and most likely k Æ 0. 

There are conditions on R that are equivalent to (17.21) being of regular variation 
of a particular index, conditions which are often separated into cases: a < 1, a = 1, 
and a > 1. The advantage of such equivalent formulations is that they involve R more 
directly rather than as integrals of s* against R. It is not a trivial task to prove the 
equivalence of the various equivalent forms of Theorem 12, at least not without some 
key theorems about regular variation. 

A distribution R is said to be in the domain of partial attraction of a nondegenerate 
distribution Q if for some choice of an and cn, some subsequence of (Q,7”) converges 
to Q, where Qn is defined by Qn(B) = R(anB + cn). It is known that only infinitely 
divisible distributions have nonempty domains of partial attraction and that there 
exist distributions that are in the domain of partial attraction of every nondegenerate 
infinitely divisible distribution. 


G. COMMENTS AND CREDITS 729 


The Lévy measures on Rf \ {0} that correspond to stable distributions on R® are 
the zero measure and those that can be represented as the product of some probability 
measure on the unit sphere centered at O and a radial measure of the form er~ “1+ dr 
for some a € (0,2) and c > 0. In order that the stable distribution be spherically 
symmetric, the probability measure on the unit sphere must be spherically symmetric. 
The characteristic functions of the spherically symmetric stable distributions in R are 
the functions of the form u ~ exp{—k|u|*} for some a € (0, 2] and k > 0. 


Concerning Chapter 18 


It is not much more difficult to prove the full Prohorov Theorem than it is to show 
that a family consisting of a single probability measure is uniformly tight. An outline 
of an early proof of this latter fact can be found in the third footnote of Oxtoby, J. C. 
and Ulam, S. M., “On the existence of a measure invariant under a transformation”, 
Annals of Mathematics 40 (1939), 560-566. 

The limit theory for triangular arrays of R¢-valued random variables is very similar 
to that for triangular arrays of R-valued random variables. 


Concerning Chapter 19 


Often the o-fields in a minimal filtration or a minimal right-continuous filtration are 
made larger by completing them. 
Let W denote standard Brownian motion on [0, 00). Fort € [0,1] and n = 3,4, 5,..., 


set 
Wnt 


4/ 2n log(log n) 


Fix t. By the Law of the Iterated Logarithm at oo and a little more work, the set of limit 
points of the sequence (Zn (t): n = 3,4,...) is the interval [- vt , Vt] with probability 
1. If we consider t to be a variable, it becomes natural to ask: Which functions 
are limit points of the sequence (Zn: n = 3,4,...)? Here also, an almost sure result 


Zn (t) = 


exists. In Strassen, V., “An invariance principle for the Law of the Iterated Logarithm”, 
Zeitschrift für Wahrscheinlichkeitstheorie und Verwandte Gebiete 3 (1964), 211-226, it 
is shown that the limits points are the absolutely continuous functions z on [0, 1] that 
satisfy +(0) = 0 and 


J [x'(t)]° A(dt) < 1, 
[0,1] 


where denotes Lebesgue measure. 

Even though the Law of the Iterated Logarithm at oo is equivalent to the existence 
of T and Ti < To < --- 4 œo for which (19.18) and (19.19) hold, it does not tell us 
whether there exists Ti < T2 < --- — œ such that 


Wr, (w) > y 2T, (w) log(log Tr (w)). 


It develops that the answer is ‘yes’. Then the question arises: What is the answer if 


/ 2t log(log t) is replaced by some y(t) for which ¢(t)/4/2tlog(logt) 3 1 as t — œ? 


With the assumption that t ~> t™'¢(t) is decreasing and t ~~ t~1/* y(t) is increasing 
(but without the limiting assumption at the end of the last sentence), the probability 


730 G. COMMENTS AND CREDITS 


that there is a random sequence Tı < T2 < --- — œ such that Wr, (w) > y(Tn(w)) is 
1 or 0, according as 
Ei [e0]? 
J tyt) E dt 


diverges or converges. This criterion is known as the Kolmogorov Test. 


Concerning Chapter 20 


The spaces L1(0,F,P) and L2(Q, F, P) belong to a family of metric spaces. For 
1 < p < œ, the metric space L p(Q, F, P) consists of equivalence classes of those 
random variables X for which E(|X|?) < oo, with the distance between [X] and [Y] 
equaling [E (|Y —X|?)]‘/”. The metric space Læ (9, F, P) consists of equivalence classes 
of almost surely bounded random variables, with the distance between [X] and [Y] 
equaling the smallest almost sure bound of |Y — X|. The only one of these metric 
spaces that is a Hilbert space is Lo. 


Concerning Chapter 21 


The adjective ‘regular’ is often attached to the phrase ‘conditional distribution’ in 
order to emphasize that it is a random distribution rather than just a collection of 
conditional probabilities of events determined by some random variable. The term 
‘regular conditional probability’ is a related term that appears in the literature. It 
refers to a random probability measure on the underlying probability space having 
certain properties. If we had introduced this concept in the text, we would have used 
the term ‘conditional probability measure’. 

The proof we have given of Lemma 21 is based on ideas obtained from Ashok Maitra 
and William Sudderth in conversations with them. 

There are different approaches for treating conditional distributions of normal ran- 
dom vectors given some of its coordinates. We have learned the one presented in the 
last section of Chapter 21 from Morris L. Eaton. The interested reader may want to 
read the third chapter of his book Multivariate Statistics, a Vector Space Approach, 
John Wiley & Sons, New York, 1983. 


Concerning Chapter 22 


Example 1 can be extended. For instance, 


P[Xo = H, Xı = B] 
P(X = B] 
P[Xo = H, Xı = B] = 7 


P([Xo = H] | [Xi = B]) = 


= PX) =H, X1=B)+P[Xo=T,X1=B) $+% 83° 


Situations where one is given conditional distributions in one direction—say for the fu- 
ture given the past—and one wants to calculate conditional distributions in the other 
direction are quite common. The Bayes Theorem is a formula that gives the desired 
conditional probabilities in terms of the given probabilities and conditional probabili- 
ties. Some books have many examples and exercises focused on the Bayes Theorem. 


G. COMMENTS AND CREDITS 731 


Concerning Chapter 23 


In “Conditional expectations of random variables without expectations”, Annals of 
Mathematical Statistics 36 (1965), 1556-1559, R. E. Strauch defines conditional ex- 
pectations of random variables that do not necessarily have finite expectations. This 
definition does not use conditional probabilities but is equivalent to our definition. 


Concerning Chapter 24 


The Kolmogorov Inequality can be viewed as a generalization to martingales of Markov- 
type inequalities for random variables. Broad classes of inequalities can be generalized 
in this manner as shown by Gilat, P. and Sudderth, W. D., in “Generalized Kolmogorov 
inequalities for martingales”, Zeitschrift fur Wahrscheinlichkeitstheorie und Verwante 
Gebiete 36 (1976), 67-73. 


The equation (24.21) appears (with the names of the parameters changed) in Prob- 
lem 18 of Chapter 12 for a different reason than that for which it appears in Chapter 24. 


Concerning Chapter 25 


The coupling proof we have presented of the Renewal Theorem in the case of positive 
recurrence is similar to the proof in Lindvall, Torgny, “A probabilistic proof of Black- 
well’s renewal theorem”, The Annals of Probability 5 (1977), 482-485. Although David 
Freedman does not mention coupling in the proof he presents in Markov Chains which 
is listed in Appendix F, the proof we have given has a strong resemblance to his for 
both the positive recurrent and null recurrent cases. 


Denote by {—a, (b—a)} the support of the step distribution of a random walk in Z 
whose steps are of Bernoulli type having mean 0. For the case a = 1 the distribution of 
the first return to 0 is explicitly obtained in terms of b in: Gould, H. W., “Generaliza- 
tions of Vandermonde’s convolution”, The American Mathematical Monthly 63 (1956), 
84-91. In “A pentagonal pot-pourri of perplexing problems, primarily probabilistic”, 
The American Mathematical Monthly 91 (1984), 559-563, Richard K. Guy indicated 
(referring to a problem proposed by Kai-Lai Chung) that an explicit formula for general 
a and b might not have yet been found. (There is no loss of generality in assuming that 
a and b are relatively prime.) 


Concerning Chapter 26 


The term ‘Markov chain’ is used by some as a synonym for ‘Markov sequence’, although 
others use ‘Markov chain’ only when the state space is countable. 


For results such as Corollary 15 describing convergence of probabilities as time 
approaches oo, it is natural to ask about the rate of convergence. There are a large 
number of papers in the probability literature about this issue. For the Ehrenfest urn 
sequence of Problem 55 modified slightly to remove the periodicity, Persi Diaconis 
and Mehrdad Shahshahani indicate in Example 1 of “Time to reach stationarity in 
the Bernoulli-Laplace diffusion model”, SIAM Journal on Mathematical Analysis 18 
(1987), 208-218, that starting with b/2 balls in each urn, it is at time approximately 
blog b that the probability distribution becomes close to the equilibrium distribution. 


732 G. COMMENTS AND CREDITS 


Concerning Chapter 27 


When people speak of the De Finetti Theorem they usually mean a theorem for the 
case of infinite exchangeable sequences. The proof of Theorem 7 we have given uses a 
finite sequence approximation similar to that by Heath, David and Sudderth, William, 
in “De Finetti’s Theorem on Exchangeable Variables”, The American Statistician 30 
(1976), 188-189. In place of our identification of a reverse martingale first in Proposi- 
tion 4 and then in Lemma 5 leading to a proof of Proposition 6, they use a tightness 
argument. 

The GEM distributions introduced in Problem 53 have been named after McCloskey, 
J. W. [A Model for the Distribution of Individuals by Species in an Environment, PhD 
thesis (1965), Michigan State University], Engen, S. [“A note on the geometric series 
as a species frequency model”, Biometrika 62 (1975), 694-699], and Griffiths, R. C. 

In Blackwell, David and MacQueen, James B., “Ferguson distributions via Pélya urn 
schemes”, The Annals of Statistics 1 (1973), 353-355, the Blackwell-MacQueen urns 
are described, though not by that name. This reference might be the first to contain 
the term ‘Ferguson distribution’. These distributions are called ‘Dirichlet processes’ 
where they are introduced in Ferguson, Thomas S., “A Bayesian analysis of some 
nonparametric problems”, The Annals of Statistics 1 (1972), 209-230. 


Concerning Chapter 28 


Some people call the spectral measure of a second-order stationary sequence its ‘spectral 
distribution’, even though its total measure is Var( Xo) which need not equal 1. 


Concerning Chapter 29 


In loose mathematical conversation there is a tendency to use the term ‘set’ when 
the term ‘multiset’? would be better. For instance, a conversation may begin with: 
“Consider a set {X1, X2,..., Xn} of n random variables.” Then later one might speak 
of the sum of these random variables, even though one does not intend to exclude the 
possibility that there may be identical random variables in the list and one intends that 
repeated copies of the same random variable be repeated in the sum. 

In the general framework of random sets, the distribution of a random set X is the 
function K ~~ PIK N X # 0] for K in an appropriate class of sets. For instance, see 
the book by Molchanov listed in Appendix F. 


Concerning Chapter 30 


The martingale approach we have outlined for proving Theorem 9 is the one used by E. 
S. Shatland in “On local properties of processes with independent increments”, Theory 
of Probability and its Applications 10 (1965), 317-322. 

Our definition of ‘regenerative set’ is narrower than the one typically used. In 
particular, our definition does not encompass the set of times when a Brownian motion 
is at 0, even though an appropriately general definition would apply to this situation. 
The set of times when Brownian motion is at 0 is uncountable with probability 1 despite 
the fact that at any particular time, Brownian motion has probability 0 of being at 0. 

The short term ‘local time’ is usually used for ‘local-time process’. 


G. COMMENTS AND CREDITS 733 


Concerning Chapter 31 


The definition of ‘infinitesimal generator’ varies from treatment to treatment. One 
place of disagreement is the sense in which the limit in Definition 13 must exist. 

Problem 20 and the fact that pz{x} = 0 point to an alternate construction of 
pure-jump Markov processes, which one might view as more natural. The measure pz 
describes the place to which the process jumps from z at the time it does jump, and q(x) 
is the expected value of the exponentially distributed duration of the stay at x. Thus 
one can construct X recursively: alternately wait an exponentially distributed amount 
of time with mean equal to q(current state), and jump according to Pcurrent state- 


Concerning Chapter 32 


Continuous-time interacting particle systems of the type considered here were first 
introduced by Frank Spitzer in “Interaction of Markov processes”, Advances in Math- 
ematics 5 (1970), 246-290. Thomas Liggett was the first to give a rigorous existence 
proof for such systems, in “Existence theorems for infinite particle systems”, Trans- 
actions of the American Mathematical Society 165 (1972), 471-481. Liggett’s result 
treats more general systems than the ones we discuss; in particular, the finite-range as- 
sumption can be considerably weakened. Discrete-time versions of these systems were 
studied extensively in Russia during the late 1960’s. For bibliographic references, see 
the book by Liggett listed in Appendix F. 

In most treatments, when the maximum particle number is n = 1, the term ‘birth’ 
is reserved for the transition 0 — 1 at a site, and the term ‘death’ is reserved for the 
transition 0 — 1. In our treatment, the transition 0 — 1 at a site can also be described 
as a ‘death’ (with wrap-around), and the transition 1 —> 0 can be described as a birth, 
so there is an ambiguity in the way the rates such transitions are divided between 
the birth and death rates. In most of our examples with n = 1, we have maintained 
consistency with other treatments by making the birth rates 0 at occupied sites and 
the death rates 0 at vacant sites. 

The original subadditive ergodic theorem of J. F. C. Kingman, in “The ergodic 
theory of subadditive stochastic processes”, Journal of the Royal Statistical Society 
B 30 (1968), 499-510, does not apply to Example 8, since one of the conditions in 
that theorem requires that the distribution of the doubly indexed random sequence 
(Zmtkn+tk: 0 < m< n) be the same for all positive integers k. Liggett’s improvement, 
which appeared in “An improved subadditive ergodic theorem”, Annals of Probability 
13 (1985), 1279-1285, was designed with applications like Example 8 in mind. The 
limit theorem in Example 8 is originally due to Richard Durrett, “On the growth of 
one dimensional contact processes”, Annals of Probability 8 (1980), 890-907. 

Given a generator G, equation (32.2) is usually impossible to solve for u. On the 
other hand, given a probability measure p satisfying certain mild conditions, there is 
a way to construct an interacting particle system (not necessarily with finite-range 
rates) with equilibrium distribution u. This fact was Spitzer’s original motivation for 
introducing interacting particle systems. 

Many years ago David Griffeath told one of us of the example appearing in Prob- 
lem 16. It was described by David Blackwell in “Another countable Markov process 
with only instantaneous states”, Annals of Mathematical Statistics 29 (1958), 313-316. 
A somewhat similar example was given by W. Feller and Henry McKean in “A diffu- 


734 G. COMMENTS AND CREDITS 


sion equivalent to a countable Markov chain”, Proceedings of the National Academy of 
Sciences, U. S. A. 42 (1956), 351-354, and by R. L. Dobrushin in “An example of a 
countable homogeneous Markov process all states of which are transient”, Theory of 
Probability and its Applications 1 (1956), 436-440. 


Concerning Chapter 33 


Since we have restricted the integrands in Ité integrals to be cadlag functions we have 
not had to impose moment conditions. Some other treatments do not make the cadlag 
assumption but do impose moment conditions. Care should be taken when comparing 
results obtained under different assumptions. Example 2 shows why it is important to 
consider integrands that are not cadlag functions. 

The original proof of existence and uniqueness of solutions of stochastic differential 
equations, given by Kiyosi It6 in “On a stochastic integral equation”, Proceedings of 
the Japan Academy 22 (1946), 32-35, used a stochastic version of the ‘Picard iteration 
method’ from the field of ordinary differential equations. 

See the remark earlier in this appendix with reference to the term ‘local-time process’ 
that appears in Chapter 30 as well as this chapter. 


Concerning Appendix A 


We avoid using the phrase ‘range of the function’, because some people take it to mean 
‘image of the function’ and others take it to mean ‘target of the function’. 


Concerning Appendix B 


Two books that introduce metric spaces without treating them as examples of topologi- 
cal spaces are: (i) Reisel, Robert B., Elementary Theory of Metric Spaces: A Course in 
Constructing Mathematical Proofs, Springer-Verlag, New York, 1982 and (ii) Copson, 
E. T., Metric Spaces, Cambridge at the University Press, Cambridge, 1968. 


Concerning Appendix C 


An elementary book about general topological spaces is Moore, Theral O., Elementary 
General Topology, Prentice-Hall, Englewood Cliffs, New Jersey, 1964. It uses open 
sets for the starting point. For an approach that begins with neighborhoods see the 
first three chapters of: Wallace, Andrew H., An Introduction to Algebraic Topology, 
Pergamon Press, New York, 1957. In this latter book, neighborhoods are not nec- 
essarily open; many books on topology define neighborhood in such a way that all 
neighborhoods are open. 


Concerning Appendix D 


The definition of Riemann-Stieltjes integral used in this appendix has been taken from: 
Apostol, Tom M., Mathematical Analysis, A Modern Approach to Calculus, Addison- 
Wesley, Reading, Massachusetts, 1964, a book which is a good reference for analysis at 
the pre-measure-theoretic level. 


G. COMMENTS AND CREDITS 735 


Concerning Appendix E 


It is possible to regard log as a bijective function having nice properties provided one 
uses an appropriate domain and an appropriate target. Take the target equal to C. To 
construct the domain start with a copy of C \ {0} corresponding to each member of 
Z, using the members of Z as subscripts in order to distinguish the copies. Cut each 
of these sets along its negative real axis. For each n € Z, attach the second quadrant 
of (C \ {0}), to the third quadrant of (C \ {0})n+1 along the cuts. The object thus 
constructed is called a Riemann surface. 

To define log on the Riemann surface just described, consider a point (z,n) on it, 
where z is a complex number different from 0 and n identifies the copy of C \ {0} on 
which it lies, regarding z as a second-quadrant complex number in case it is a negative 
real number. Set log(z,n) equal to that version of log z whose coefficient of i lies in 
the interval (27n — n, 2rn + 7]. It can be checked that log is a continuous one-to-one 


function from the Riemann surface onto C satisfying Ten log(z,n) = z7}. 


0-1 trivial o-field, 561 
W-functional 
nonanticipating, 663 
square-integrable, 667 
L,-distance, 401 
L,-norm, 401 
L: (N, F, P), 401 
L2-distance, 399 
L2-norm, 399 
L(Q, F, P), 398 
L (N, F, P), 730 
C[0,1] as a metric space, 696 
0, 687 
p-a.e., 104 
m- Theorem, 725 
R as a metric space, 696 
o-field, 6 
Borel, 9 
exchangeable, 197, 198 
infinite product, 137 
product, 128 
shift-invariant, 555 
tail, 196, 198, 555 
trivial, 197 
0-1, 197 
trivial 0-1, 561 
o-finite measure, 83 
wrap-around 
in particle system, 642 
a.e., 689 
a.s., 689 
a.s.-defined random variable, 48 
i.p., 689 
0-1 Law 
Blumenthal, 610 
0-1 law 
Hewitt-Savage, 197 
Kolmogorov, 196 
0-1 trivial o-field, 197 


a.e., 104 
a.s., 13 
absolute moments, 68 
about mean, 68 
absolute value 
of complex number, 688 
of real number, 688 
absolute value of complex number, 711 


Index 


absolutely continuous, 115, 726 
absorbing state, 531 
absorption probability, 531 
accessible state, 527, 639 
adapted, 171, 382 
additive 

countably, 7 

finitely, 7 
almost everywhere, 104, 689 
almost sure convergence 

of point processes, 598 
almost surely, 13, 689 
aperiodic 

Markov sequence, 527 

renewal sequence, 503 

state, 525 

transition operator, 527 
arcsin density, 37 
arcsin distribution, 37 
Arcsin Law, 379 
Arzela-Ascoli theorem, 696 
Arzela-Ascoli Theorem, 373 
ascending ladder time 

of random walk, 507 
asexual reproduction 

in particle system, 651 
asymmetric Cauchy distribution, 728 
asymptotic formula, 113 
atomic, purely, 407 
attractive infinitesimal generator 

in particle system, 657 
autonomous differential equation 

stochastic, 672 


ball 
closed, 693 
open, 693 

Baxter, John, 725 

Bayes Theorem, 730 

Bernoulli distribution, 29, 60 
characteristic function, 213 
mean, 60 
moment generating function, 221 
multivariate, 152 
probability generating function, 60 
variance, 60 

Bernoulli product measure 
in particle system, 656 


738 


Bessel function 
modified, first kind, 292 
best linear estimator, 402 
beta density, 37, 61 
beta distribution, 37, 246, 543 
density, 61 
mean, 56, 61 
variance, 61 
bilateral exponential distribution, 210 
characteristic function, 212 
bilateral geometric distribution, 217 
binomial coefficient, 688 
binomial distribution, 38, 60, 245 
characteristic function, 213 
mean, 44, 60 
moment generating function, 221 
negative, 155 
normal approximation to, 233 
Poisson limit of, 245 


probability generating function, 60 


second moment, 44 
variance, 60 
Birkhoff Ergodic Theorem, 558 
birth 
in particle system, 642 
birth rate 
of particle system, 642 
of birth-death process, 638 
birth rates 
of pure-birth process, 637 
birth-death process, 638 
birth-death sequence, 516, 532 
Blackwell, David, 654, 732, 733 
Blackwell-MacQueen urn, 550, 551 
Blumenthal 0-1 Law, 610 
Blumenthal, R. M., 725 
bold play, 486 
Borel o-field, 9 
Borel Lemma, 78, 110 
conditional, 409 
Borel set, 9 
Borel space, 418 
Borel subset, 9 
Borel-Cantelli Lemma, 79, 111 
conditional, 422 
boundary, 693, 697 
boundary condition 
for the Dirichlet problem, 683 
boundary values 
for the Dirichlet problem, 679 
bounded, 694 
totally, 694 


INDEX 


bounded convergence, 628 


Bounded Convergence Theorem, 108, 110, 


251 
bounded operator, 629 
bounded rates 


for pure-jJump Markov process, 632 


in particle system, 644 
bounded slope, 673 
boundedly, 628 
box, 10 
open, 10 
branching distribution, 517 
branching process, 517, 524, 638 
extinction time, 638 
Brownian local-time process, 672 
Brownian motion, 370, 380, 389 
change of scale, 381 
degenerate, 390 
standard, 390 
symmetry, 381 
time inversion, 381 
time shift, 381 
with drift, 390 


cadlag, 601 
cadlag function, 621, 692 
random, 622 
cadlag space, 621 
canonical sample space, 514 
Cantor distribution, 39, 118 
mean, 58 
second moment, 58 
Cantor function, 727 
capacity 
Newtonian, 571 
cardinality of a set, 688 
Catalan number, 73 
Cauchy 
in probability, 201, 354 
Cauchy density, 35 
Cauchy distribution, 30, 248, 332 
asymmetric, 728 
characteristic function, 212 
Cauchy process, 609 
Cauchy sequence 
in metric space, 694 
Cauchy type, 293 
Cauchy-Schwarz Inequality, 62, 110 
Cauchy-Schwarz inequality 
conditional, 456 
ceiling, 689 
cellular automata, 519 


centered lattice, 286 
centered lattice distribution, 286 
centering 
of distributions, 263 
Central Limit Theorem, 275 
Multi-dimensional, 363 
Chain Rule for densities, 116 
change of variables, 706 
Change of Variables Proposition, 117 
Change of Variables Theorem, 136 
Chapman-Kolmogorov equations, 626 
characteristic exponent, 300, 304 
characteristic function, 209 
Bernoulli, 213 
bilateral exponential, 212 
binomial, 213 
Cauchy, 212 
characterization of, 269 
delta, 213 
exponential, 212 
for Rt, 234 
gamma, 212 
Gaussian, 212 
geometric, 213 
negative binomial, 213 
normal, 212 
of convolution, 215 
Poisson, 213 
triangular, 212 
uniform, 212 
Chebyshev Inequality, 62, 111 
Chebyshev inequality 
conditional, 455 
Chen, Robert W., 723 
chord 
random, 23, 38, 58, 61 
Chung, Kai-Lai, 731 
Classical Central Limit Theorem, 275 
closed ball, 693 
closed set, 693, 697 
closure, 693, 697 
coefficient 
of Ornstein-Uhlenbeck process, 677 
of stochastic difference equation, 662 
of stochastic differential equation, 672 
coin-flip space, 5, 8, 15, 17, 20, 30, 38- 
40, 66, 77, 79, 107, 124, 127, 534, 
557, 595 
Extension Theorem applied to, 95 
column matrix, 688 
commutative group, 152 
commutative semigroup, 160 


739 


compact, 694, 698 
locally, 581 
relatively, 694, 698 
sequentially, 695 
relatively, 695 
compact space, 698 
compactification 
one-point, 699 
two-point, 699 
compactness 
relative sequential, 255, 356 
complete convergence, 727 
complete metric space, 396, 694 
completely monotone, 266 
completion 
of field, 91, 94 
of measure, 95 
complex function 
derivative of, 711 
complex number, 711 
absolute value of, 711 
exponential of, 711 
imaginary part of, 688, 711 
logarithms of, 711 
real part of, 688, 711 
complex numbers, 688 
composition, 689 
of measurable functions, 13, 51, 52, 
55, 57 
compound Poisson distribution, 290, 300 
infinitely divisible limit of, 295, 301 
compound Poisson process, 604 
concave function, 69 
conditional density, 416 
conditional distribution, 413 
existence of, 418 
notation, 418 
of normal random vector, 427, 730 
regular, 730 
uniqueness of, 418 
conditional expectation, 443, 731 
conditional independence, 422 
conditional probability, 404 
regular, 730 
conditional variance, 453 
conditional versions of results 
Borel, 409 
Borel-Cantelli, 422 
Cauchy-Schwarz, 456 
Chebyshev, 455 
dominated convergence, 449 
Fubini, 431 


740 


Jensen, 450 
monotone convergence, 448 
uniform integrability, 449 
conditionally exchangeable, 435 
configuration 
of particles, 641 
configuration space, 641 
contact process 
with asexual reproduction, 651 
with sexual reproduction, 654 
with threshold birth rates, 653 
continuity in metric space, 695 
Continuity of Measure Theorem, 25, 77, 
110 
Dominated, 107 
Monotone, 106 
Continuity Theorem, 262, 361, 599 
continuous at a point, 701 
continuous function, 690 
convergence 
for Radon measures, 598 
in metric space, 694 
convergence for distribution functions, 244 
convergence for distributions, 244, 254, 352 
convergence in distribution, 250, 352, 690 
of point processes, 598 
convergence in measure, 186 
convergence in probability, 185, 354 
of point processes, 598 
Convergence of Strict Types Theorem, 265 
Convergence of Types Theorem, 265 
convergence theorems 
bounded, 108, 251 
conditional dominated, 449 
conditional monotone, 448 
conditional uniform integrability, 449 
dominated, 107, 187 
Fatou, 106, 187, 252 
martingale, 478, 480 
monotone, 49 
monotone convergence 
for integrals, 105 
monotone convergence for sums, 82 
reverse martingale, 484 
reverse submartingale, 482 
submartingale, 480 
uniform integrability, 108, 187, 252, 
726 
converges boundedly, 628 
convex function, 68 
convex hull, 145, 687 
convex set, 144 


INDEX 


convolution, 148 

characteristic function of, 215 

moment generating function of, 218 

of measures, 493 

probability generating function of, 153 
convolution root, 149, 294 
coordinate-wise convergence, 349 
correlation, 64, 111, 690 

negative, 64 

positive, 64 
cosine integral, 222, 727 
countably additive, 7, 87 
countably many, 6 
counting measure, 84, 118 
coupling, 500, 638 
coupling,universal 

of particle systems, 644 
coupon collecting, 439 
covariance, 63, 111, 690 
covariance function, 66, 678 
covariance matrix, 66, 571 
cover, 694 

open, 694 
Cramér-Wold Device, 362 
cyclic threshold process, 652 
cylinder set, 137 


death 
in particle system, 642 
death rate 
of particle system, 642 
of a birth-death process, 638 
decreasing function, 692 
in particle system, 657 
strictly, 692 
decreasing sequence 
of sets, 76 
decreasing set sequence, 691 
strictly, 691 
degenerate Brownian motion, 390 
degenerate type, 31, 263 
degenerate Wiener measure, 390 
delay distribution, 499 
delay time, 499 
delayed regenerative set 
in Rt, 616 
delayed renewal sequence, 499 
stationary, 518 
delta distribution, 27, 29, 329 
characteristic function, 213 
moment generating function, 221 
density, 35, 116 


INDEX 


and Fubini Theorem, 135 

and independence, 134 

arcsin, 37 

beta, 37, 61 

Cauchy, 35 

Chain Rule, 116 

conditional, 416 

empirical, 534 

Bernoulli case, 535 

exponential, 35, 61 

gamma, 36, 61 

Gaussian, 36, 61 

marginal, 136 

normal, 36, 61 

Reciprocal Rule, 117 

uniform, 35, 61 
derivative 

Radon-Nikodym, 116 
derivative of complex function, 711 
descending ladder time 

of random walk, 507 
destructive random walk, 433 
De Finetti measure, 534, 540, 544, 546 

Bernoulli case, 535 
De Finetti Theorem, 732 

finite case, 536, 545 

infinite case, 540, 547 
Diaconis, Persi, 726, 731 
differential, 669 
diffusion, 661 
diffusion coefficient, 673 
dimension, Hausdorff, 620 
Dirichlet distribution, 157, 550 

some parameters = 0, 551 
Dirichlet problem 

in 1 dimension, 679 

in higher dimensions, 683 
Dirichlet process, 607, 732 
discrete generator, 514 
discrete set, 583 
disjoint 

pairwise, 6 
distance, 693 

L,-, 401 

L2-, 399 

in complex plane, 688 

in Euclidean space, 688 

in Hilbert space, 397 

in inner product space, 396 
distribution, 12 

arcsin, 37 

asymmetric Cauchy, 728 


T41 


Bernoulli, 29 
multivariate, 152 
beta, 37, 543 
bilateral exponential, 210 
bilateral geometric, 217 
binomial, 38 
negative, 155 
branching, 517 
Cantor, 39, 118 
Cauchy, 30 
asymmetric, 728 
centered lattice, 286 
centering and scaling of, 263 
compound Poisson, 290 
on R+, 300 
conditional, 413 
existence of, 418 
uniqueness of, 418 
delay, 499 
delta, 27 
Dirichlet, 157, 550 
some parameters = 0, 551 
Dirichlet process, 732 
empirical, 544 
equilibrium, 528 
ergodic, 561 
exponential, 35 
extremal, 562 
Ferguson, 551, 732 
Fréchet, 248 
gamma, 36, 146 
Gaussian, 36 
on Rt, 237 
GEM, 550, 732 
generalized Poisson, 289 
geometric, 30 
Gumbel, 247 
infinitely divisible, 149, 303 
lattice, 286 
multinomial, 153 
multivariate Bernoulli, 152 
negative binomial, 155 
normal, 36, 146, 727 
on R2, 237 
of random set, 732 
offspring, 517 
Poisson, 38, 589 
compound, 290, 300 
generalized, 289 
two-sided, 292 
posterior, 543 
prior, 542 


742 


random, 413 
shift-invariant, 555 
span of lattice, 286 
stable, 278 
strictly, 278 
stable, index 
step, 489 
strictly stable, 278 
symmetric, 34 


1 
1, 280 


triangular, 216 
two-sided Poisson, 292 
uniform, 16 
waiting time, 489, 500 
Weibull, 248 
Yule-Furry, 220 
zeta, 145, 726 
distribution function, 25, 27, 39, 111 
continuous, 28, 39 
empirical, 195 
for R, 39 
distributions on R 
Extension Theorem applied to, 96 
Dobrushin, R. L., 734 
domain of attraction, 278 
domains of strict attraction, 278 
Dominated Convergence Theorem, 107, 187 
conditional, 449 
domination, stochastic, 569 
Donsker Invariance Principle, 374 
Doob Decomposition Theorem, 465 
Doob Upcrossing Lemma, 479 
dot product, 688 
double or nothing, 469 
drift, 390 
drift coefficient, 673 
drift of Lévy process, 607 
Durrett, Richard, 733 
Dynkin, E. B., 725 


Eaton, Morris L., 730 
edge speed 
in particle systems, 659 
Ehrenfest urn sequence, 532 
empirical density, 534 
Bernoulli case, 535 
limiting, 540 
empirical distribution, 544 
limiting, 546 
empirical distribution function, 195 
empty set, 687 
Engen, 5., 732 
equicontinuity, 258 


INDEX 


equicontinuous, 696 
uniformly, 696 
equilibrium distribution, 528, 624 
lower, 658 
upper, 658 
equivalence class, 689 
of distributions, 31 
of events, 13 
of random variables, 13, 48 
ergodic 
distribution, 561 
sequence, 561 
ergodic lemma, maximal, 560 
ergodic theorem 
Birkhoff, 558 
Kingman-Liggett, 565 
ergodic theory, 554 
error function, 220 
estimator 
best linear, 402 
Etemadi Lemma, 200 
Euclidean space, 688 
Euler’s constant, 114 
event, 7 
exchangeable, 197, 198 
null, 13 
tail, 196, 198 
exchangeable, 434 
o-field, 197, 198 
conditionally, 435 
event, 197, 198 
exclusion, 652, 656 
existence theorem, 94 
expectation, 45 
conditional, 443, 731 
does not always exist, 45, 56 
of compositions, 51, 52, 55, 57 
of nonnegative random variables, 44, 
58 
of simple random variables, 41 
with respect to probability measure, 
46 
expectation operator, 42, 48, 690 
with respect to probability measure, 
46 
expected value, 44 
explosion, 637 
exponent 
characteristic, 300 
exponential density, 35, 61 
exponential distribution, 35, 245, 255 
characteristic function, 212 


INDEX 743 


density, 61 Fréchet distribution, 248 

limit of geometric, 243 mean, 249 

mean, 58, 61 variance, 249 

moment generating function, 221 frequency, 572 

relation to Poisson, 169, 590 Fubini Theorem, 130 

variance, 61 applied to densities, 135 
exponential of complex number, 711 conditional, 431 
Extension Theorem, 94, 725 for sums, 82 
extinction function 

in particle system, 651 cadlag, 692 
extinction time continuous, 690 

of a branching process, 638 decreasing, 692 
extremal distribution, 562 harmonic, 521 
extremal measure, 660 image of, 691 

increasing, 692 

factorial indicator, 14 

falling, 155 measurable, 11 

rising, 155 monotonic, 692 
factorial moments, 72 positive definite, 227 
falling factorial, 155, 688 right-continuous 
family increasing, 690 

interacting particle system, 644 simple, 14 

Markov, 623 strictly decreasing, 692 
family of distributions, 32 strictly increasing, 692 
Fatou Lemma, 106, 187, 252 strictly monotonic, 692 
Feller process, 627 strictly positive definite, 227 
Feller semigroup, 627 subharmonic, 521 
Feller, W., 733 superharmonic, 521 
Ferguson distribution, 551, 732 target of, 691 
Ferguson, Thomas S., 732 functional, 374 
field, 87 Fundamental Theorem of Gambling, 485 
field, Galois, 159 
filtration, 171, 382 Galois field, 159 

adapted random sequence, 171 gambler’s ruin, 473 

minimal, 171, 382 gambling 

minimal right-continuous, 384 fundamental theorem of, 485 

reverse, 464 gamma density, 36, 61 

right-continuous, 384 gamma distribution, 36, 245, 255 
finite measure, 81 characteristic function, 212 
finite range density, 61 

in particle system, 643 limit of negative binomial, 246 
finite-range rates mean, 56, 61 

in particle system, 644 moment generating function, 221 
finitely additive, 7, 87 normal approximation to, 233 
first passage time, 389 polar coordinate independence, 146 
first return time, 178 variance, 61 
First Wald Identity, 475 gamma function, 36, 114 
first-passage time gamma process, 607 

for percolation, 570 Gardiner, Martin, 723 
floor, 689 Gaussian density, 36, 61 
Fourier transform, 209 Gaussian distribution, 36, 246, 329 


Fourier-Stieltjes transform, 209 characteristic function, 212 


744 


density, 61 
mean, 61 
on Rt, 237 
variance, 61 
Gaussian process, 666, 678 
stationary, 678 
Gaussian sequence, 558 
GEM distribution, 550, 732 
generalized Poisson distribution, 289 
generated 
o-field, 8 
generated by, 123 
generating function, 724 
measure, 493 
moment, 218, 727 
probability, 70, 724 
generating functional 
probability, 594 
generator,infinitesimal, 628, 644, 733 
geometric distribution, 30, 60, 245, 255 
characteristic function, 213 
exponential limit of, 243 
mean, 50, 60 
moment generating function, 221 
probability generating function, 60 
relation to return time, 180 
second moment, 50 
variance, 60 
Getoor, R. K., 725 
Gilat, P., 731 
Glivenko-Cantelli Theorem, 195 
Gould, W. H., 731 
Griffeath, David, 733 
Griffiths, R. C., 732 
group 
commutative, 152 
of rotations, 152, 167 
random walk in, 167 
Gumbel distribution, 247 
mean, 247 
Guy, Richard K., 731 


Halmos, Paul R., 725 
harmonic, 521 

Hausdorff dimension, 620 
Hausdorff measure, 620 
Hausdorff metric, 22 
Hausdorff space, 698 

Heath, David, 732 

Herglotz Lemma, 575 
Hewitt-Savage 0-1 Law, 197 
Hilbert space, 396 


Hilbert space span, 402 
Hilbert subspace, 396 
hitting probability, 522 
hitting time, 172, 383 

of point, 386 

of two-point set, 389 


iid, 689 
iid sequence, 164 
image of a point, 691 
image of function, 691 
image of random walk, 206 
imaginary part, 711 
imaginary part of complex number, 688 
improper Riemann-Stieltjes integral, 708 
in probability, 689 
Inclusion-Exclusion Theorem, 80, 724 
increasing function, 568, 692 
in particle system, 657 
strictly, 692 
increasing right-continuous function, 690 
increasing sequence 
of sets, 76 
increasing set sequence, 691 
strictly, 691 
increments 
of a random walk, 165, 166, 175 
independence, 123 
and densities, 134 
conditional, 422 
row-wise, 304 
independent increments, 166, 175, 369, 602 
independent random walks 
particle system, 652 
index 
of Fréchet distribution, 248 
of regular variation, 325 
of stable distribution, 329 
of strictly stable distribution, 329 
of Weibull distribution, 248 
indicator function, 14, 209, 690 
indicator random variable, 15, 49, 66 
induced, 12 
inequalities 
Cauchy-Schwarz, 62 
Chebyshev, 62 
conditional Cauchy-Schwarz, 456 
conditional Chebyshev, 455 
conditional Jensen, 450 
Jensen, 69 
Kolmogorov, 478, 665, 674, 731 
Markov, 62 


infimum, 689 
infinite measure, 81 
infinite measure space 
compared to finite measure space, 110 
infinite product, 710 
of o-fields, 137 
of probability measures, 139 
of probability spaces, 139 
infinite-dimensional cube, 350, 352, 356 
infinitely divisible, 149, 303 
infinitely divisible distribution 
limit of compound Poisson, 295, 301 
infinitesimal generator, 628, 644, 733 
initial condition 
of stochastic differential equation, 672 
initial distribution, 624 
of Markov sequence, 436, 511 
initial state, 514, 624, 630 
initial value 
of stochastic differential equation, 672 
inner product, 688 
inner product space, 396 
integrable 
Riemann-Stieltjes, 704 
integral 
Lebesgue, 102 
Riemann-Stieltjes, 53, 111, 690, 704 
improper, 708 
integration by parts, 707 
intensity measure, 586 
interacting particle system, 644 
family of, 644 
interior, 693, 697 
interval, length of, 688 
Invariance Principle 
Donsker, 374 
invariant measure, 594 
invariant point process, 594 
inverse 
left-continuous, 28 
right-continuous, 28 
inversion formula 
for densities on Z, 227 
for densities on R, 231 
irreducible 
Markov process, 639 
Markov sequence, 527 
transition operator, 527 
irreducible recurrence class, 527 
Isaacson, Dean, 727 
isomorphic measurable spaces, 418 
Itô integral, 664 


745 


Itô Lemma, 683 
It6 Representation Theorem, 606, 608 
It6, Kiyosi, 734 
iterated logarithm 
law of, 391, 392 


Jain, Naresh, 726 
Jennrich, Robert I., 726 
Jensen Inequality, 69, 110 
Jensen inequality 
conditional, 450 
jointly measurable, 164 
jump distribution, 635 
jump rate, 635 
jump-rate function, 635 


Kahane, Jean-Pierre, 724 

Kingman, J. F. C., 725, 733 
Kingman-Liggett Ergodic Theorem, 565 
Kochen, Simon, 724 

Kochen-Stone Lemma, 78, 111, 724 
Kolmogorov 0-1 Law, 196 

Kolmogorov Inequality, 478, 665, 674, 731 
Kolmogorov Test, 730 

Kolmogorov Three-Series Theorem, 203, 727 
Kolmogorov, A. N., 723 


Lévy measure, 296, 301, 303 

of Lévy process, 607 
Lévy process, 602, 625 

drift of, 607 

Lévy measure of, 607 

stable, 609 

strictly stable, 609 

with respect to filtration, 610 
Lévy-Khinchin Representation, 299, 302 
ladder time 

of random walk, 507 
Laplace transform, 209 
Laplace-Stieltjes transform, 209 
Large Deviations Theorem, 281 
large numbers 

law of, 65, 192, 273 
lattice, 286 

centered, 286 

shift, 286 

span, 286 
lattice distribution, 286 

centered, 286 
Law of Averages, 59, 65 
Law of Large Numbers, 65, 111, 186, 273 

Strong, 192 


746 


Law of Rare Events, 291 


Law of the Iterated Logarithm, 391, 392 


least squares estimate, 402 
Lebesgue integral, 102, 725 
does not exist, 102 
Lebesgue measure, 16, 17, 725 
d-dimensional, 129 
on R, 84, 98, 111, 117, 118 
translation invariance, 97 
on Ri, 129 
on a subset of IR, 84 
on unit interval, 30, 39 
on unit square, 17 
one-dimensional, 84 
two-dimensional, 98 
Lebesgue measure on R? 
translation invariance, 98 
left transition operator 
of Markov sequence, 511 
left-continuous inverse, 28 
length of interval, 688 
Liggett, Thomas, 733 
limit, 701 
limit infimum of sequence 
of sets, 76 


limit of distribution functions, 245, 254 


limit of sequence 
of sets, 76 
limit supremum of sequence 
of sets, 76 
limit type, 265 
strict, 265 
limiting empirical density, 540 
limiting empirical distribution, 546 
Lindeberg-Feller Theorem, 322 
linear estimator 
best, 402 
minimum variance unbiased, 402 
linear operator, 42 
linear span, 402 
linearity 
of conditional expectation, 445 
of expectation, 48, 134 
lines 
space of, 99 
local growth of subordinator, 619 
Local Limit Theorem 
continuous case, 284 
lattice case, 287 
local-time process, 612, 616 
of Brownian motion, 672 
locally compact, 581 


logarithm as complex function, 712 
logarithms of complex number, 711 
lower equilibrium distribution 


in particle system, 658 


MacQueen, James B., 732 
Maitra, Ashok, 730 
majority vote process, 653 
marginal, 359 


finite-dimensional, 359 


marginal density, 136 
Markov chain, 731 
Markov distribution 


strong, 626 
time-homogeneous, 623 


Markov family, 514, 623 


strong, 626 
time-homogeneous, 623 


Markov Inequality, 62, 111 
Markov process 


birth-death, 638 
branching, 638 
irreducible, 639 
pure-birth, 637 
pure-jump, 632, 637 
recurrent, 639 

strong, 611, 626 
time-homogeneous, 623 
transient, 639 


Markov property 


strong, 521 


Markov sequence, 731 


aperiodic, 527 

discrete generator of, 514 
equilibrium distribution, 528 
initial distribution of, 436, 511 
irreducible, 527 

null recurrent, 527 
Ornstein-Uhlenbeck, 577, 578 
period of, 527 

positive recurrent, 527 

related to renewal sequence, 518, 519 
state space of, 511 

stationary, 528, 557, 562 
time-homogeneous, 436, 512 
transient, 527 

transition distribution of, 436 
transition distributions of, 512 
transition operator of, 511, 512 


Marsaglia, George, 726 
Marsaglia, John C. W., 726 
martingale, 459, 478 


INDEX 747 


continuous-time, 629 isomorphic, 418 
reverse, 465 measure, 81 
Martingale Convergence Theorem, 478, 480 o-finite, 83 
reverse, 484 continuity of, 25, 77 
martingale problem, 520, 630 counting, 84 
matrix De Finetti, 534, 540, 544, 546 
column, 688 Bernoulli case, 535 
definite finite, 81 
positive, 692 Hausdorff, 620 
strictly positive, 692 infinite, 81 
positive definite, 67, 692 intensity, 586 
row, 688 invariant, 594 
strictly positive definite, 67, 692 Lebesgue, 16 
transpose of, 688 d-dimensional, 129 
Maximal Ergodic Lemma, 560 one-dimensional, 84 
maximum, 689 two-dimensional, 98 
of Brownian motion, 375 mutually singular, 83 
maximum particle number, 641 potential, 492 
McCloskey, J. W., 732 probability, 7 
McKean, Henry, 733 product, 129 
mean, 44 Radon, 111, 581 
Bernoulli, 60 renewal, 491 
beta, 56, 61 translation-invariant, 84, 97 
binomial, 44, 60 uniqueness, 86 
Cantor, 58 Wiener, 370 
exponential, 58, 61 measure generating function, 493 
Fréchet, 249 measure-preserving 
gamma, 56, 61 transformation, 554 
Gaussian, 61 median, 34 
geometric, 50, 60 mesh, 703 
Gumbel, 247 metric, 693 
normal, 56, 61 metric space, 693 
Poisson, 51, 60 complete, 396, 694 
uniform, 61 metrize, 348 
Weibull, 249 minimal filtration, 171, 382 
Yule-Furry, 220 right-continuous, 384 
zeta distribution, 145 minimum, 689 
mean function, 66 Minkowski sum, 161, 687 
mean matrix, 66 mixing, 563 
mean return time strongly, 563 
of a state, 525 weakly, 563 
mean vector, 66, 571 moment generating function, 209, 218, 727 
measurable Bernoulli, 221 
jointly, 164 binomial, 221 
progressively, 382 characterization, 267 
measurable function, 11 delta, 221 
measurable rectangle, 122, 128, 137 exponential, 221 
measurable set, 6 gamma, 221 
measurable space, 11 geometric, 221 
of probability measures, 413 negative binomial, 221 
product, 128 of convolution, 218 


measurable spaces Poisson, 221 


748 


triangular, 221 
uniform, 221 
Moment Problem, 220 
moments, 68, 72 
about mean, 68 
absolute, 68 
about mean, 68 
binomial distribution, 44 
Cantor distribution, 58 
factorial, 72 
geometric distribution, 50 
Poisson distribution, 51 
uniform distribution, 58 
Monotone Class Theorem, 725 
Monotone Convergence Theorem, 49 
conditional, 448 
for integrals, 105 
for sums, 82 
for variances, 60 
monotonic function, 692 
strictly, 692 
monotonic set sequence, 691 
strictly, 691 
moving averages, 558 
Multi-dimensional Central Limit Theorem, 
363 
multinomial coefficient, 537, 688 
multinomial distribution, 153 
multiplicity of a point, 584 
multiset, 584 
multivariate Bernoulli distribution, 152 
mutually singular, 83, 194 


natural scale, 681 
nearest neighbor random walk, 166 
negative binomial distribution, 155, 246 
characteristic function, 213 
gamma limit of, 246 
moment generating function, 221 
negative binomial process, 607 
negative part, 45, 689, 691 
negatively correlated 
events, 79 
random variables, 64 
neighborhood, 698 
network walk, 518 
Newtonian capacity, 571 
nonanticipating W-functional, 663 
nondegenerate type, 265 


norm 
Lı-, 401 
L2-, 399 


INDEX 


in Lı, 688 
in L2, 688 
in Euclidean space, 688 
in Hilbert space, 688 
in inner product space, 396, 688 
normal approximation 
to binomial, 233 
to gamma, 233 
normal density, 36, 61 
normal distribution, 36, 246, 329, 727 
characteristic function, 212 
density, 61 
limit of Poisson, 299 
mean, 56, 61 
on R@, 237 
polar coordinate independence, 146 
variance, 61 
normal random vector 
conditional distribution of, 427, 730 
null event, 13 
null recurrent 
Markov process, 639 
Markov sequence, 527 
random walk, 506 
renewal process, 618 
renewal sequence, 499 
state, 525, 639 
transition operator, 527 


occupation-measure process, 613 
occupation-time process, 612, 617 
occupied site, 641 
offspring distribution, 517 
one-point compactification, 699 
one-sided stationary sequence, 553 
open ball, 693 
open cover, 694 
open set, 693, 697 
operator, 42 
expectation, 42, 48, 690 
linear, 42 
positive, 42, 692 
strictly positive, 48, 692 
Optional Sampling Theorem, 467 
order statistic, 144 
order statistics, 158 
Ornstein-Uhlenbeck process, 677 
stationary, 678 
Ornstein-Uhlenbeck sequence, 577, 578 
orthogonal projection, 397 
Owen, Willis, 727 
Oxtoby, J. C., 729 


pairwise disjoint, 6 
Parseval Formula, 233, 283 
Parseval Relation, 211 
for R, 235 
particle jump, 642 
particle jump process 
with exclusion, 652, 656 
with wrap-around, 652 
particle jump rate, 642 
particle number, 641 
maximum, 641 
partition, 14, 692, 703 
countable, 14 


finite, 14 
point, 54, 703 
mesh of, 703 


refinement of, 703 
Patin, J. M., 726 
Peano curve, 352 
percolation, 570 
first-passage time for, 570 
time constant for, 570 
period, 227 
of Markov sequence, 527 
of state, 525 
of transition operator, 527 
period of renewal sequence, 503 
periodic, 227 
permutation, 83, 197 
point 
image of, 691 
point mass 
unit, 27, 29 
point partition, 54, 703 
mesh of, 703 
refinement of, 703 
point process, 583, 585 


almost sure convergence of, 598 
convergence in distribution, 598 
convergence in probability, 598 
intensity measure of, 586 
invariant, 594 

of lines, 593 

Poisson, 588 

sum of, 597 


Poisson distribution, 38, 60, 245, 255, 589 


characteristic function, 213 
compound, 290, 300 
generalized, 289 

limit of binomial, 245 
mean, 51, 60 


moment generating function, 221 


INDEX 


749 


normal limit of, 299 
probability generating function, 60 
relation to exponential, 169 
second moment, 51 
two-sided, 292 
variance, 60 
Poisson point process, 588 
of lines, 593 
probability generating functional of, 596 
Poisson process, 604 
compound, 604 
standard, 604 
two-sided, 612 
polar coordinates 
random, 146 
Polish space, 347, 419 
Polya urn, 437, 461, 544 
Port, Sidney C., 726 
Portmanteau Theorem, 252, 353 
positive definite 
function, 227, 269 
matrix, 67 
positive definite matrix, 692 
strictly, 692 
positive members of vector spaces, 42 
positive operator, 42, 692 
strictly, 692 
positive part, 45, 689, 691 
positive recurrent 
Markov process, 639 
Markov sequence, 527 
random walk, 506 
renewal process, 618 
renewal sequence, 499 
state, 525, 639 
transition operator, 527 
positively correlated 


events, 79 
random variables, 64 
positivity 


of conditional expectation, 444 


of expectation, 48 
posterior distribution, 543 
potential function, 614, 616 
potential measure, 492, 616 
potential sequence, 492 

product of, 495 
previsible, 465 
prior distribution, 542 


probabilistic cellular automata, 519 


probability 
of an event, 7 


750 


probability generating function, 70, 153, 
724 
Bernoulli, 60 
binomial, 60 
characterization of, 73 
geometric, 60 
of convolution, 153 
Poisson, 60 
probability generating functional, 594 
of Poisson point process, 596 
probability measure, 7 
product, 139 
shift-invariant, 555 
probability measures 
measurable space of, 413 
probability space, 7 
product, 139 
process, 580 
birth-death, 638 
branching, 517, 638 
Cauchy, 609 
compound Poisson, 604 
contact, 651, 653, 654 
cyclic threshold, 652 
Dirichlet, 607 
Feller, 627 
gamma, 607 
Gaussian, 666 
Lévy, 602, 625 
local-time, 612, 616 
local-time, of Brownian motion, 672 
majority vote, 653 
Markov, 623 
pure-jump, 632 
strong, 611 
Markov pure-jump, 637 
negative binomial, 607 
Ornstein-Uhlenbeck, 677 
particle, 652 
particle jump, 652, 656 
point, 583, 585 
Poisson, 604 
Poisson point, 588 
pure-birth, 637 
pure-jump Markov, 632, 637 
renewal, 616 
stable Lévy, 609 
stationary, 572 
Gaussian, 678 
Ornstein-Uhlenbeck, 678 
stationary independent increments, 602 
stochastic, 580 


INDEX 


strictly stable Lévy, 609 
strong Markov, 611, 626 
two-sided Poisson, 612 
Wiener, 370 
product 
infinite, 710 
of random matrices, 571 
of topological spaces, 700 
product o-field, 128 
product measure, 129, 139 
product of 0 and œo, 691 
product of measurable spaces, 128 
product of probability spaces, 139 
product space, 129, 139 
product topology, 700 
progressively measurable, 382 
Prohorov metric, 363 
Prohorov Theorem, 357 
projection 
in product of Polish spaces, 359 
projection, orthogonal, 397 
proper difference, 691 
proper set difference, 85 
proper subset, 691 
pseudoinverse of a matrix, 428 
pure-birth process, 637 
pure-jump Markov process, 637 
with bounded rates, 632 
pure-tone sequence, 572 
purely atomic, 407 


Radon measure, 111, 581 
Radon-Nikodym derivative, 116, 726 
Chain Rule, 116 
Reciprocal Rule, 117 
Radon-Nikodym Theorem, 116, 447 
random cadlag function, 622 
random chord, 23, 38, 58, 61, 100 
random coefficient, 572 
random distribution, 413 
random function, 20 
random line, 100 
random matrix 
product of, 571 
random number, 16 
random sequence, 18, 164 
adapted to filtration, 171 
iid, 164 
random set, 22, 23, 589 
of lines, 593 
random set, distribution of, 732 
random signs, 205 


random variable, 11 
indicator, 15, 49, 66 
simple, 15 
random vector, 17, 726 
random walk, 165, 169, 437 
destructive, 433 
image of, 206 
in group, 167 
in network, 519 
in semigroup, 167 
ladder times of, 507 
nearest neighbor, 166 
null recurrent, 506 
positive recurrent, 506 
recurrence of, 238 
recurrent, 182 
return time of, 506 
simple, 166, 175, 179, 182, 183, 500, 
507, 510, 522, 524 
stick-breaking, 193, 549 
symmetric, 166 
transient, 182, 506 
zero set of, 506 
random walk with reinforcement, 433 
range 
of rates in particle system, 643 
rates with finite range 
in particle system, 643 
real part, 711 
real part of complex number, 688 
Reciprocal Rule for densities, 117 
rectangle 
measurable, 122, 128, 137 
recurrence 
of random walk on Z, 238 
recurrent 
random walk, 506 
regenerative set, 498 
renewal process, 618 
renewal sequence, 498, 499 
state, 525, 639 
recurrent random walk, 182 
Red and Black, 461, 463, 484 
refinement, 703 
reflection principle, 374 
regeneration property, 490 
regenerative set, 490, 613 
delayed 
in Rt, 616 
in Zt, 490 
in Rt, 616 
recurrent, 498 


INDEX 


751 


transient, 498 
regular conditional 

distribution, 730 

probability, 730 
regular set, 93 
regular variation, 324 

index of, 325 
reinforcement 

random walk with, 433 
relative sequential compactness, 255, 356 
relative topology, 700 
relatively compact, 694, 698 
relatively sequentially compact, 599, 695 
renewal measure, 491, 614, 616 
renewal process, 613, 616 

null recurrent, 618 

positive recurrent, 618 

recurrent, 618 

transient, 618 
renewal sequence, 489 

aperiodic, 503 

delayed, 499 

null recurrent, 499 

period of, 503 

positive recurrent, 499 

product of, 495 

recurrent, 498, 499 

stationary, 502, 558 

strong law, 498 

transient, 498 
Renewal Theorem, 503 

for Markov sequences, 526 
renewal time, 489 
renewals 

number of in a set, 491 
return probability, 522 
return time, 180 

and geometric distribution, 180 

first, 178 

of a state, 525 

of random walk, 506 
reverse filtration, 464 
reverse martingale, 465 
Reverse Martingale Conv. Theorem, 484 
reverse submartingale, 465 
Reverse Submartingale Conv. Theorem, 482 
reverse supermartingale, 464 
Riemann surface, 735 
Riemann zeta function, 145 
Riemann-Lebesgue Lemma, 282 
Riemann-Stieltjes 

integrable, 704 


752 INDEX 


integral, 704 increasing, 76 
improper, 708 limit infimum of, 76 
sum, 704 limit of, 76 
lower, 704 limit supremum of, 76 
upper, 704 sequential compactness 
Riemann-Stieltjes integral, 53, 111, 690 relative, 255 
Riesz Representation Theorem, 725 sequentially compact, 695 
right transition operator relatively, 599, 695 
of Markov sequence, 512 series of independent rv’s, 272 
right-continuous filtration, 384 set difference 
minimal, 384 proper, 85 
right-continuous function set sequence 
increasing, 690 decreasing, 691 
right-continuous inverse, 28 increasing, 691 
rising factorial, 155, 688 monotonic, 691 
root, convolution, 149, 294 strictly decreasing, 691 
rotation, stationary, 557 strictly increasing, 691 
rotations strictly monotonic, 691 
group of, 152, 167 sexual reproduction 
row matrix, 688 in particle system, 654 
row sum, 304 Shahshahani, Mahrdad, 731 
row-wise independence, 304 Shatland, E. S., 732 
Rudin, Walter, 725 shift 
of infinitely divisible distribution, 301 
sample point, 7 shift of a lattice, 286 
sample space, 7 shift transformation, 554 
canonical, 514 shift-invariant 
sampling o-field, 555 
optional, 467 distribution, 555 
sampling integrability conditions, 467 probability measure, 555 
scalar product, 688 Sierpinski class, 85 
scale function, 681 Sierpinski Class Theorem, 86 
scaling Sierpinski, W., 724 
of Brownian motion, 381 simple function, 14 
of distributions, 263 simple random variable, 15 
second moment simple random walk, 166, 175, 179, 182, 
binomial, 44 183, 500, 507, 510, 522, 524 
Cantor, 58 simplex 
geometric, 50 standard (d — 1)-, 534 
Poisson, 51 sine integral, 222, 727 
Second Wald Identity, 476 singular, 194 
second-order sequence, 571 site 
second-order stationary, 572 of integer lattice, 641 
semigroup slow variation, 325 
commutative, 160 space of lines, 99 
random walk in, 167 span 
separable, 694 Hilbert, 402 
sequence linear, 402 
potential, 492 span of a lattice, 286 
renewal, 489 spectral 
sequence of sets distribution, 732 


decreasing, 76 distribution function, 574 


INDEX 753 


measure, 574 pure tone, 572 
spectral measure, 732 renewal, 502, 558 
spectrum second-order, 572 
continuous, 575 spectrum of, 574 
pure, 575 continuous, 575 
point, 575 point, 575 
pure, 575 pure continuous, 575 
spectrum of stationary sequence, 574 pure point, 575 
sphere, 693 two-sided, 553 
Spitzer, Frank, 733 stationary strategy, 484 
square-integrable step, 165 
W-functional, 667 step distribution, 165, 489 
stable distribution, 278 stick-breaking random walk, 193, 549 
index of, 329 Stirling Formula, 113, 114, 726 
of index 5, 280, 339, 386 Stirling number 
stable Lévy process, 609 second kind, 441, 442 
standard (d — 1)-simplex, 534 Stirling numbers, 724 
standard basis, 688 first kind, 72 
standard Brownian motion, 390 second kind, 72 
standard deviation, 60 stochastic difference equation, 662 
standard distribution, 32 stochastic differential, 669 
standard Wiener measure, 390 stochastic differential equation, 669 
Stanley, Richard P., 724 autonomous, 672 
state stochastic domination, 569 
absorbing, 531 stochastic independence, 123 
accessible, 527, 639 stochastic integral, 669 
aperiodic, 525 stochastic process, 20, 572, 580 
null recurrent, 525 stationary independent increments, 602 
period of, 525 Stone, Charles, 724 
positive recurrent, 525 stopping time, 172, 382 
recurrent, 525 Strassen, V., 729 
return time of, 525 strategy, 462 
transient, 525 stationary, 484 
state of a random function, 622 Strauch, R. E., 731 
state space, 165, 512, 623 strict limit type, 265 
of Markov sequence, 511 strict type, 31 
stationary Gaussian process, 678 strictly decreasing function, 692 
stationary increments, 165, 369, 602 strictly decreasing set sequence, 691 
stationary Markov sequence, 528 strictly increasing function, 692 
stationary process strictly increasing set sequence, 691 
Gaussian, 678 strictly monotonic function, 692 
Ornstein-Uhlenbeck, 678 strictly positive definite 
stationary sequence, 518, 553 function, 227 
ergodic, 561 matrix, 67 
Gaussian, 558 strictly positive definite matrix, 692 
in Hilbert space, 571 strictly positive operator, 48, 692 
Markov, 557, 562 strictly stable distribution, 278 
mixing, 563 index of, 329 
strongly, 563 of index 5, 280, 339 
weakly, 563 strictly stable Lévy process, 609 
one-sided, 553 Strong Law 


Ornstein-Uhlenbeck, 577, 578 for renewal sequences, 498 


T54 


Strong Law of Large Numbers, 192 
fails if means do not exist, 193 
strong Markov, 626 
strong Markov distribution, 626 
strong Markov family, 626 
strong Markov process, 611 
strong Markov property, 521 
subharmonic, 521 
submartingale, 460, 478 
reverse, 465 
Submartingale Convergence Theorem, 480 
reverse, 482 
subordinate Lévy process, 611 
subordination, 611 
subordinator, 602 
derivative of, 618 
local growth of, 619 
subspace 
topological, 700 
Sudderth, William, 730-732 
summation $` (1/k?), 239 
summation by parts, 708 
superharmonic, 521 
supermartingale, 460, 478 
reverse, 464 
support 
of a distribution function, 33 
of a probability measure, 33 
support function, 160, 726 
supremum, 689 
survival 
in particle system, 651 
symmetric, 34 
symmetric difference, 75 
symmetric distribution, 34 
symmetric random walk, 166 
symmetrization, 217 
symmetry 
of Brownian motion, 381 
symmetry (metric), 693 


table 
characteristic functions, 212, 213 
densities, 61 
distributions on ZT, 60 
distributions on R, 61 
gamma function, 114 
means, 60, 61 
moment generating functions, 221 
probability generating functions, 60 
Stirling Formula, 114 
variances, 60, 61 


tail 
o-field, 196, 198 
event, 196, 198 
tail o-field, 555 
target of function, 691 
Taylor, S. J., 725 
Three-Series Theorem 
Kolmogorov, 203, 727 
tight, 357 
uniformly, 357, 373 
time 
renewal, 489 
waiting, 489 
time constant 
for percolation, 570 
time inversion 
of Brownian motion, 381 
time parameter 
discrete, 164 
time shift 
of Brownian motion, 381 
time-homogeneous Markov distribution, 623 
time-homogeneous Markov family, 623 
time-homogeneous Markov process, 623 
time-homogeneous Markov sequence, 436, 
512 
topological space, 697 
Hausdorff, 698 
topological spaces 
product of, 700 
topological subspace, 700 
topology, 697 
product, 700 
relative, 700 
totally bounded, 355, 694 
transform, 218 
transformation 
measure-preserving, 554 
shift, 554 
transient 
Markov process, 639 
Markov sequence, 527 
random walk, 506 
regenerative set, 498 
renewal process, 618 
renewal sequence, 498 
state, 525, 639 
transition operator, 527 
transient random walk, 182 
transition distribution, 623 
of Markov sequence, 436 
transition distributions 


of Markov sequence, 512 
transition function, 436 
transition matrix, 516 
transition operator, 512 
aperiodic, 527 
discrete generator of, 514 
irreducible, 527 
left, 511 
null recurrent, 527 
of Markov sequence, 511, 512 
period of, 527 
positive recurrent, 527 
right, 512 
transient, 527 
transition probability, 636 
transition rate, 636 
transition semigroup, 622, 623 
Feller, 627 
translation invariance, 84, 97, 98 
transpose of matrix, 688 
triangle inequality, 693 
triangular array, 304 
triangular distribution, 216 
characteristic function, 212 
moment generating function, 221 
trivial o-field, 197 
0-1, 197 
trivial o-field,0-1, 561 
two-point compactification, 699 
Two-Series Theorem, 727 
two-sided Poisson distribution, 292 
two-sided Poisson process, 612 
two-sided stationary sequence, 553 
Tychonoff Theorem, 700 
type, 31 
convergence of, 265 
convergence of strict, 265 
degenerate, 31, 263 
densities of same, 35 
strict, 31 


uan, 305 
Ulam, S. M., 729 
unbiased linear estimator 
minimum variance, 402 
uncorrelated 
events, 79 
random variables, 64 
uniform continuity in metric space, 695 
uniform density, 35, 61 
uniform distribution, 248 
characteristic function, 212 


INDEX 


759 


density, 61 
mean, 61 
moment generating function, 221 
moments, 58 
on (0,1), 16 
on [0,1], 16 
on a finite set, 84 
on a subset of R, 84 
on unit square, 17, 23, 98 
Extension Theorem applied to, 98 
variance, 61 
Uniform Integrability Criterion, 108, 110, 
187, 252, 726 
conditional, 449 
uniformly asymptotically negligible, 305 
uniformly equicontinuous, 696 
uniformly integrable, 108 
uniformly tight, 357, 373 
Uniqueness of Measure Theorem, 86 
unit point mass, 27, 29 
universal coupling 
of particle systems, 644 
upcrossing, 479 
Upcrossing Lemma, 479 
upper equilibrium distribution 
in particle system, 658 
urn, 4 
Blackwell-MacQueen, 550, 551 
Ehrenfest, 532 
Polya, 437, 461, 544 
to model exchangeable sequence, 536 


vacant site, 641 

vague convergence, 727 

vanish at oo, 625 

variance, 59, 111, 690 
Bernoulli, 60 
beta, 61 
binomial, 60 
conditional, 453 
exponential, 61 
Fréchet, 249 
gamma, 61 
Gaussian, 61 
geometric, 60 
normal, 61 
Poisson, 60 
uniform, 61 
Weibull, 249 
Yule-Furrv, 220 
zeta distribution, 145 

variation 


756 INDEX 


regular, 324 
slow, 325 
vector space, 395 


waiting time, 489 
waiting time distribution, 489, 500 
walk in network, 518 
weak convergence, 727 
weak* convergence, 727 
Weibull distribution, 248 
mean, 249 
variance, 249 
Weyl Equidistribution Theorem, 563 
Whitt, Lee, 726 
Wiener measure, 370, 374, 380 
degenerate, 390 
on C[0, œœ), 380 
standard, 390 
Wiener process, 370, 380 
d-dimensional, 682 
with drift, 390 
Wiener sausage, 571 
Wilf, Herbert S., 724 


Yule-Furry distribution, 220 
mean, 220 
variance, 220 


zero set 
of random walk, 506 

zeta distribution, 145, 255, 726 
mean, 145 
variance, 145 


